Automatic playlist generation using recurrent neural network

Nowadays the music fruition is mainly ruled by streaming services such as Spotify, Tidal or Apple Music. To properly organize their vast music catalogs, these services tend to aggregate the available tracks in playlists, namely sequences of tracks built on a thematic logic. Traditionally, these playlists are manually generated, by the users or by professional curators. However the immediate availability of the music experience and the exponentially growing number of accessible contents have brought to the need of a robust automation for the playlist generation process. Several methods have been presented in literature, which are usually based on defining a priori proprieties the generated playlists should have. These properties, such as constraints on genres or artists, are known as target characteristics, and are used as metrics to assess the quality of the resulting playlist. However, being the playlist composition a subjective procedure, based on the author musical taste, it is not possible to define a unique and undeniable set of quality attributes for it. In this study we propose an automatic playlist generation approach which analyzes hand-crafted playlists, understands their structure and generates new playlists accordingly. Our approach draws inspiration from the language modeling techniques, since a playlist can be seen as a textual composition in which the songs play the role of words. In fact, when we compose a well structured playlists, we select tracks which reflect an evolution of characteristics such as genre or mood. This is pretty similar to what we do when we talk or write: we aggregate words in a way which represents a development of concepts. Our approach is concerned with extracting this evolution by modeling human crafted sequences of songs. To address this task, we have adopted a deep learning architecture, in particular a Recurrent Neural Network, which is specialized in sequence modeling. The resulting model has been tested following two approaches, playlist generation and playlist continuation, to ensure the model ability in understanding a human crafted playlist structure.

Il settore del consumo musicale ha subito una profonda metamorfosi negli ultimi decenni. Il concetto di possesso dell'opera musicale, ha lasciato spazio all'accesso ai contenuti, proposti per la maggior parte da servizi di streaming come Spotify o Apple Music. I cataloghi musicali messi a disposizione dell'utente da questi servizi sono vasti e largamente differenziati. Per renderle accessibili, le tracce disponibili sono spesso organizzate in playlist. Una playlist, solitamente, è composta a mano da curatori esperti o ascoltatori appassionati che selezionano una determinata sequenza di tracce in base ad una tematica di fondo. L'enorme quantità di contenuti musicali a disposizione rende la composizione manuale di playlist un'operazione dispendiosa in termini di tempo. Per questo motivo un'automatizzazione della generazione di playlist si è resa necessaria. Solitamente questo processo parte dall'imposizione di una serie di caratteristiche base che la playlist nale dovrebbe avere. Tali caratteristiche, ad esempio vincoli sui generi o sugli artisti da includere, servono da misura qualitativa della playlist. Tutto questo però, ignora la caratteristica soggettiva della playlist, per cui non esistono misure di qualità assolute. In questo lavoro di tesi, ci focalizziamo sull'analisi delle playlist composte da esseri umani, presentando un modello in grado di derivarne la struttura e di emularla nel suo processo di generazione. Questo studio prende ispirazione dai modelli generativi di linguaggio. L'abilità di noi esseri umani nel comporre playlist infatti, è assimilabile alle modalità con cui ci esprimiamo attraverso il linguaggio. Al posto delle parole, le canzoni sono aggregate al ne di trasmettere un' evoluzione di caratteristiche, come generi musicali o stati d'animo. Il nostro modello, progettato per comprendere questa evoluzione, si basa su un modello di deep learning, una Recurrent Neural Network, specializzata nella modellazione di sequenze, effettuata emulando le stutture del pensiero umano. Per vericare la validità del modello ne abbiamo testato sia la capacità di generazione di playlist, che quella di continuazione di playlist esistenti.