Automatic audio compositing system based on music information retrieval

In the past few years, music recommendation and playlist generation systems have become one of the most promising research areas in the field of audio processing. Due to the large diffusion of the Internet, users are able to collect and store a consistent amount of musical data and can make use of them in everyday life thanks to portable media players. The challenge of modern recommendation systems is how to process this huge amount of data in order to extract useful descriptors of the musical content, i.e. how to perform automatic tagging, catalog, indexing media material. This information may be used for many purposes: media search, media classification, market suggestions, media similarity measurements, etc. Until now, the traditional approach to this problem has been audio labelling. This operation consists in the definition of symbolic descriptors that can be used for generating the playlist. Examples of this sort are playlists based on music genre or artist name. This approach has some strong limitations: first of all, since labels are usually considered as descriptors of the whole musical piece, they cannot capture mood or genre changes inside the same song. Moreover, the label classification sometimes results in heterogeneous classes (e.g. music belonging to the same genre can be very different one from each other). This thesis gets into this context and it consists in the study and development of a music recommendation framework that allows the user to interact by means of more precise descriptors. The system intelligently recommends items of an audio database on the basis of the preferences of the user. Music Information Retrieval techniques are used in order to extract significant features from the audio signal and allow the user to interact with the system by means of high level interfaces such as musical tempo or timbric features. During the description of the system, we will prove the generality of the approach by describing some of the many applications that could be derived from the framework: an automatic DJ system, a tabletop interaction system, a playlist generation system based on runner's step frequency and training-based recommendation system. The goal of this project is not only the development of a technically valid product but also an exploration of the artistic applications. The system is addressed to a wide public of performers (DJs, contemporary music executors, ...), composers and amateurs.

Negli ultimi anni, i sistemi di music recommendation} e di generazione dinamica di playlists sono diventati aree di ricerca estremamente promettenti. Grazie alla grande diffusione di Internet, gli utenti possono memorizzare un insieme consistente di dati musicali e farne uso nel contesto di tutti i giorni grazie a riproduttori musicali portatili. Il problema dei moderni sistemi di music recommendation è come elaborare questa grande quantità di dati ed estrarre descrittori significativi del contenuto; questa informazione può essere usata per molti scopi: ricerca musicale, classificazione, consigli commerciali o misure di similarità audio. Fino ad ora, l'approccio tradizionale al problema è stato audio labeling. Quest'operazione consiste nella definizione di descrittori simbolici che possano essere usati per la generazione della playlist. Esempi di questo tipo sono playlist basate sul genere musicale o sul nome dell'artista. Questo approccio ha però alcune forti limitazioni: prima di tutto, le labels sono in genere considerate descrittori dell'intero brano musicale e non considerano cambiamenti di genere o mood all'interno della stessa canzone. Oltre a ciò, la classificazione per label si traduce spesso in classi molto eterogenee; ad esempio, musica appartente allo stesso genere può avere caratteristiche molto diverse. Questa tesi si inserisce in questo contesto e consiste nello studio e sviluppo di un framework di music recommendation che permetta all'utente di interagire tramite un insieme di descrittori piu precisi. Il sistema consiglia in modo intelligente brani musicali sulla base delle preferenze dell'utente. Tecniche di Music Information Retrieval vengono usate al fine di estrarre features significative direttamente dal segnale musicale e permettere all'utente di interagire con il sistema per mezzo di interfacce di alto livello come tempo o features timbriche. Durante la descrizione del sistema, daremo prova della generalità dell'approccio usato descrivendo alcune delle molte applicazioni che possono essere derivate dal framework: un sistema di DJ automatico, un sistema di interazione tabletop, un generatore dinamico di playlist basato sulla frequenza del passo di una persona che corre a un sistema di recommendation basato sull'apprendimento. L'obiettivo di questo progetto non è solo lo sviluppo di una piattaforma tecnicamente valida ma anche l'analisi delle applicazioni artistiche che il sistema può trovare. Esso è infatti indirizzato ad un vasto pubblico di esecutori (DJs, esecutori di musica contemporanea, ...), compositori e dilettanti.