Music recommendation system based on audio segmentation and feature evolution

The advent of new technologies, like Internet, digital audio formats, and portable media players, make it easier to produce and distribute music, exponentially increasing the offer, but making it harder to find songs that suits users' tastes. Therefore an important topic in research today is the development of music browsing, searching and organizing techniques and tools. Music recommendation systems are one of the solutions to this problem. They focus on generating playlists according to the similarities between a chosen track and a predefined music collection. Similarities are generally based on the physical, perceptive, and acoustical properties of the audio signal (content-based approach), or on manually defined tags (context-based approach). Content information is obtained using Multimedia Information Retrieval techniques that extract descriptors out of a song, like rhythm, harmony, or loudness. Songs can then be compared using algorithms specialized on finding similarities between the extracted features, and matching items are proposed to the user as a playlist. The purpose of this thesis is to extend an existing content-based music recommendation system, producing an application that generates playlists according to the acoustic features of the audio being played. This application works in both a local and a web environment, using a client-server infrastructure where the recommendation engine is not tied to the player. The core components of the application are exchangeable plugins. Similarity is measured according to two different approaches: local and global. In the local approach a song is segmented into a sequence of "cells" that represent highly homogeneous parts and similarity is evaluated over its descriptors. In the global approach, the similarity is performed over the whole song using the evolution on time of the extracted features. In both approaches the users are able to interact with the system by defining the desired feature values in order to improve the output.

Con l'avvento di Internet, dei formati audio digitali e dei riproduttori musicali portatili, la produzione e distribuzione di contenuti musicali ha subito un'accelerazione notevole. L'abbondanza dell'offerta ha avuto come conseguenza un necessario ripensamento del nostro modo di reperire la musica che ci interessa. Così negli ultimi anni lo sviluppo di tecniche e di strumenti per la ricerca e l'organizzare di contenuti musicali sta acquistando una notevole importanza tecnica. I sistemi di "music recommendation" sono una delle soluzioni a questo problema. Essi si occupano della generazione di "playlist" secondo criteri di similarità tra un brano e una collezione di contenuti musicali. Questa similarità si basa sulle proprietà fisiche, percettive e acustiche del segnale audio (approccio "content-based") e su i tag definiti manualmente (approccio "context-based"). Le informazioni sul contenuto si ottengono tramite tecniche di "Multimedia Information Retrieval" che estraggono i descrittori da un brano, come il tempo, l'armonia, oppure la rumorosità. I brani vengono comparati attraverso funzioni di similarità tra le feature estratte. La playlist proposta all'utente viene popolata sulla base di tali funzioni. Lo scopo di questa tesi è l'estensione di un sistema di music recommendation content-based esistente. L'applicazione è in grado di generare playlist secondo le proprietà acustiche dell'audio riprodotto. Essa è in grado di funzionare sia in modalità locale sia in modalità web, secondo un'infrastruttura di tipo "client/server" indipendente dalla piattaforma. L'applicazione è stata dotata di funzionalità implementate attraverso plugin intercambiabili. La similarità viene calcolata in due modi: locale e globale. Nel modo locale il brano è segmentato in "celle" omogenee dalle quali vengono estratte le feature da comparare. Nel modo globale la similarità dipende dell'evoluzione nel tempo delle feature del brano completo. In entrambi i casi, l'utente può interagire impostando i valori dei feature per migliorare l'output.