Micro-clustering for session-based music recommendations

Recommender Systems (RSs) help users in discovering interesting items over huge catalogs of products by providing them personalized recommendations that fit with their interests and preferences. RSs are known to perform well in different domains, such as e-commerce, music, video etc. The thesis focuses on a session-based recommender system in the music domain. Differently from traditional RSs that leverage the historical behaviour of the user, session-based recommender systems leverage information about the current user session to predict how it will continue. With respect to other domains, music RSs are characterized by a huge number of items (i.e. music tracks). Therefore, datasets are difficult to process, sparsity negatively affects quality of recommendations. Moreover, many tracks are different versions of the same songs. The problem we want to tackle is the identification and merging of these redundant items. We are going to both demonstrate that these items should not be recommended together and that collapsing them before training a model could enhance the performance. We adopted two different approaches to find these items: one we called content-based micro-clustering and the other we called collaborative micro-clustering. By implementing our techniques in a large scale music recommender system and by adopting a fair evaluation method, we show that merging items before training a model can boost performance and, moving forward, may also have important implications for diversity: by removing too-similar items, more slots open for other items.

I Sistemi di Raccomandazione aiutano gli utenti a scoprire oggetti interessanti all'interno di enormi cataloghi di prodotti offrendo raccomandazioni personalizzate ritagliate sui loro interessi e preferenze. I sistemi di raccomandazione sono conosciuti per le buone prestazioni in diversi domini, come e-commerce, musica, video etc. La tesi si concentra su un sistema di raccomandazione basato sulle sessioni nel dominio della musica. A differenza dei tradizionali sistemi di raccomandazione che si basano sul comportamento passato dell'utente, i sistemi di raccomandazione basati sulle sessioni sfruttano informazioni sulla sessione corrente per predire come questa proseguirà. Rispetto ad altri domini, i sistemi di raccomandazione musicali sono caratterizati da un numero enorme di elementi (canzoni). Quindi, i dati sono diffcili da processare e la sparsità influenza negativamente la qualità delle raccomandazioni. Inoltre, molte canzoni sono versioni diverse della stessa canzone. Il problema che vogliamo affrontare è l'identificazione e l'unione di questi oggetti ridondanti. Dimostreremo sia che questi oggetti non dovrebbero essere raccomandati insieme e che unirli prima di addestrare un modello può migliorarne le prestazioni. Abbiamo adottato due differenti approcci per identificare questi oggetti: il primo l'abbiamo chiamato content-based micro-clustering, il secondo collaborative micro- clustering. Implementando le nostre tecniche in un sistema di raccomandazione su larga scala e adottando un equo sistema di valutazione, mostriamo che unire gli oggetti prima di addestrare il modello può spingere le prestazioni e, guardando avanti, può anche avere importanti implicazioni per quanto riguarda la diversità delle raccomandazioni: rimuovendo oggetti troppo simili tra loro, si liberano degli slots per altri oggetti che possono essere raccomandati.