Understanding the impact of sampling during hyper-parameter optimisation in recommender systems

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

With the recent growth of e-commerce, recommender systems have become an essential tool for the develop and success of these businesses in this sector. Recommender systems require high amounts of computational power and time to find the optimal values for the hyper-parameters, which is a necessary task before you can train a model and start making recommendation. Optimising these hyper-parameters is a crucial step of creating a model and yields directly in its performance on the dataset. With newer and more complex algorithms, the number of hyper-parameters keeps increasing which corresponds to an increase of resources needed for the optimisation phase. In this thesis, we study the effects of tuning the hyper-parameters through a random walk working with sampled dataset. We investigate two different sampling techniques and test them over four different datasets and five algorithms. To analyse the outcome of tuning the hyper-parameters on a sampled dataset, we observe how the ranking has changed compared to the one produced by the experiments performed on the whole un-sampled dataset, which highlights interesting results regarding the relationship between resources needed and performance.

Con la recente crescita degli e-commerce, i sistemi di raccomandazione sono uno strumento essenziale per lo sviluppo e il successo dei business in questo settore. I sistemi di raccomandazione richiedono un'elevata potenza computazionale e un elevato tempo per trovare i valori ottimali per gli iperparametri, compito necessario che va svolto prima di addestrare un modello e iniziare a fare raccomandazioni. Ottimizzare questi iperparametri è un passo cruciale nella creazione di un modello e ha effetti diretti sulla prestazione rispetto un determinato dataset. Con nuovi e sempre più complessi algoritmi, il numero di iperparametri continua a crescere. Questo comporta un aumento nelle risorse necessarie per la fase di ottimizzazione. In questa tesi, studiamo l'effetto della calibrazione degli iperparametri attraverso una camminata casuale lavorando con dataset campionati. Investighiamo due diverse tecniche di campionamento e le testiamo su quattro diversi dataset e cinque algoritmi. Per analizzare l'esito della calibrazione per gli iperparametri su un campionamento di un dataset, osserviamo come la classifica cambia rispetto a quella prodotta dagli esperimenti eseguiti sull'intero dataset, il quale evidenzia risultati interessanti riguardante la relazione tra risorse necessarie e prestazione.

Understanding the impact of sampling during hyper-parameter optimisation in recommender systems

MONTANARI, MATTEO

2020/2021

Abstract

With the recent growth of e-commerce, recommender systems have become an essential tool for the develop and success of these businesses in this sector. Recommender systems require high amounts of computational power and time to find the optimal values for the hyper-parameters, which is a necessary task before you can train a model and start making recommendation. Optimising these hyper-parameters is a crucial step of creating a model and yields directly in its performance on the dataset. With newer and more complex algorithms, the number of hyper-parameters keeps increasing which corresponds to an increase of resources needed for the optimisation phase. In this thesis, we study the effects of tuning the hyper-parameters through a random walk working with sampled dataset. We investigate two different sampling techniques and test them over four different datasets and five algorithms. To analyse the outcome of tuning the hyper-parameters on a sampled dataset, we observe how the ranking has changed compared to the one produced by the experiments performed on the whole un-sampled dataset, which highlights interesting results regarding the relationship between resources needed and performance.

Scheda breve

Scheda completa

	Relatore
	
				CREMONESI, PAOLO
			
	Correlatore/i
	
				BERNARDIS, CESARE
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				28-apr-2021
			
	Anno accademico
	
				2020/2021
			
	Abstract in italiano
	
				Con la recente crescita degli e-commerce, i sistemi di raccomandazione sono uno strumento essenziale per lo sviluppo e il successo dei business in questo settore. I sistemi di raccomandazione richiedono un'elevata potenza computazionale e un elevato tempo per trovare i valori ottimali per gli iperparametri, compito necessario che va svolto prima di addestrare un modello e iniziare a fare raccomandazioni. Ottimizzare questi iperparametri è un passo cruciale nella creazione di un modello e ha effetti diretti sulla prestazione rispetto un determinato dataset. Con nuovi e sempre più complessi algoritmi, il numero di iperparametri continua a crescere. Questo comporta un aumento nelle risorse necessarie per la fase di ottimizzazione.
In questa tesi, studiamo l'effetto della calibrazione degli iperparametri attraverso una camminata casuale lavorando con dataset campionati. Investighiamo due diverse tecniche di campionamento e le testiamo su quattro diversi dataset e cinque algoritmi. Per analizzare l'esito della calibrazione per gli iperparametri su un campionamento di un dataset, osserviamo come la classifica cambia rispetto a quella prodotta dagli esperimenti eseguiti sull'intero dataset, il quale evidenzia risultati interessanti riguardante la relazione tra risorse necessarie e prestazione.
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
Master Thesis Montanari Matteo.pdf accessibile in internet solo dagli utenti autorizzati Descrizione: Master Thesis Dimensione 7.2 MB Formato Adobe PDF Visualizza/Apri	7.2 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/173782