Analysis of musical structure : an approach based on deep learning

A distinctive trait of the digital era is the easy accessibility to an enormous quantity of music content. The Music Information Retrieval (MIR) is a broad research field with the principal aim of extracting salient information from the audio signals. In this way, the organization of contents in large libraries results incredibly facilitated. Music Structural Analysis is one of the topics in MIR. The purpose of the Music Structure Analysis is to retrieve the structure of songs at the largest temporal scale, i.e., its division in structural parts like the Chorus and the Verse, in automatic fashion. The analysis of the structure of songs benefits in several areas: for example, the improvement of the auto-tagging systems, or the generation of audio thumbnails, i.e., representative summaries of the song. Music Structural Analysis mainly focuses on the detection of the temporal variation of some characteristics along the music piece. Among the characteristics that are commonly used there are the harmony, the timbre and the rhythm. However, the selection of these features is often a problematic procedure, since it is necessary to know which are the most effective to perform the task, and consequently build procedures to compute them. The obtained descriptors are generally called \textit{hand-crafted}, since they are specifically designed to represent the sought properties. An alternative approach is represented by the deep learning techniques, which are able to automatically obtain an abstract representation of data, without the explicit knowledge of the salient features to extract. Since the deep learning techniques have proved to be effective in several areas, in this work we investigate on their use in Music Structural Analysis. More precisely, we use a Deep Belief Network to extract a sequence of descriptors that is successively given as input to several Music Structural Analysis algorithms presented in literature. We finally compare the performance of the obtained descriptors with the commonly used hand-crafted features.

Un tratto distintivo dell'era digitale è la possiblità di accedere facilmente ad un'enorme quantità di contenuti musicali. Il Music Information Retrieval (MIR) è un vasto campo di ricerca con il principale obiettivo di estrarre informazioni salienti dai segnali audio. In questo modo, l'organizzazione di contenuti in vaste librerie risulta incredibilmente faciliatata. L'Analisi della Struttura Musicale è una delle applicazioni del MIR. L'obiettivo dell'Analisi della Struttura Musicale è di estrarre la struttura delle canzoni alla più alta scala temporale, i.e., la sua suddivisione in parti strutturali come il Ritornello e la Strofa, in modo automatico. L'analisi della struttura delle canzoni porta benefici in alcune aree: per esempio, il miglioramento dei sistemi di auto-tagging, o la generazione di anteprime audio, i.e., riassunti rappresentativi di canzoni. L'Analisi della Struttura Musicale si occupa principalmente di individuare la variazione temporale di qualche proprietà lungo il brano musicale. Tra le caratteristiche che sono comunemente usate ci sono armonia, timbro e ritmo. Tuttavia, la selezione di queste features è spesso una procedura problematica, in quanto è necessario conoscere quali siano le più efficaci per eseguire il compito, e di conseguenza sviluppare procedure per calcolarle. I descrittori ottenuti sono generalmente chiamati \textit{artigianali}, poiché sono specificatamente progettati per rappresentare le proprietà cercate. Un approccio alternativo è rappresentato dalle tecniche di apprendimento approfondito, che sono in grado di ottenere automaticamente una rappresentazione astratta dei dati, senza la conoscenza esplicita delle features salienti da estrarre. Poiché le tecniche di apprendimento approfondito hanno dimostrato la loro efficacia in diverse aree, in questo lavoro investighiamo il loro utilizzo nell'Analisi della Struttura Musicale. Più precisamente, utilizziamo una Deep Belief Network per estrarre i descrittori che sono successivamente dati in ingresso ad alcuni algoritmi di Analisi della Struttura Musicale presentati in letteratura. Infine, confrontiamo le prestazioni dei descrittori ottenuti con le features artigianali comunemente usate.