Beat tracking using recurrent neural network : a transfer learning approach

In the last decades we witnessed to crucial changes in the context of music fruition. With the increasing popularity of streaming services, that today are the main music content providers, there has been an exponential growth of the amount of music available. To provide suggestions to their users, content providers need to classify their vast catalogs, thus, meaningful information has to be extracted out of a musical piece. This has laid the foundation of the Music Information Retrieval (MIR), a research field which goal is to find approaches to automatically retrieve information from musical excerpts. One of the aspects that characterize most a music piece is the rhythm, which has the beat as its basic element. While for human beings the beat perception is an almost instinctive capability, its machine-based equivalent is a non-trivial task. One important research field of MIR is Beat Tracking which aims to automatically extract the beat out of a musical piece. Several methods have been proposed and one of the most promising involves Deep Neural Networks due to their ability to emulate the human mind. The effectiveness of deep learning networks is limited by the amount and the variety of data used for the training. For this reason, deep-learning models can be applied in scenarios where a huge amount of annotated data is available. In MIR this is the case of popular genres due to the wider availability of annotated datasets. Instead, to find sufficient data is an hard task for non widespread genres like folk music. A recent approach for overcoming the need of large datasets is transfer learning. It is based on miming the ability of human brain to address novel problems by applying knowledge acquired to solve similar problems in different contexts. In this work, we propose an approach to apply transfer learning for beat tracking. We use a deep RNN as the starting network trained on popular music, and we transfer it to track beats of folk music. Moreover, we test if the resultant models are able to deal with highly variable music. In order to evaluate the effectiveness of our approach, we collect a dataset of Greek folk music, and we manually annotate the pieces.

Negli ultimi decenni abbiamo assistito ad una rivoluzione del consumo di musica. Con l’avvento dei servizi di streaming musicale, gli utenti sono diventati testimoni di una crescita esponenziale dell’offerta musicale a loro disposizione. Questa organizzazione mirata dei contenuti multimediali è resa possibile dall’estrazione di informazioni significative dai brani musicali. La necessità di un’estrazione automatizzata delle informazioni dall’audio ha gettato le basi dell’area di ricerca del Music Information Retrieval (MIR). In particolare questo studio si focalizza sugli aspetti legati alla struttura ritmica dei brani, che vede il beat come suo elemento base. Mentre la percezione del beat è una capacità istintiva per l’uomo, automatizzarla non è un compito banale. Infatti, l’automatizzazione del Beat Tracking è uno degli ambiti di studio più importanti nel contesto MIR. Tra i metodi proposti, uno dei più promettenti coinvolge le reti neurali. L’efficacia di questi modelli è limitata dalla quantità di dati utilizzati per l’apprendimento. Dunque, le reti di deep learning possono essere allenate e applicate in scenari in cui sono disponibili enormi quantità di dati. Nel caso del MIR, mentre per generi musicali diffusi sono disponibili molti dati, per generi di nicchia come la musica folk la disponibilità di dati utilizzabili diminuisce sensibilmente, per cui risulta complesso allenare e utilizzare modelli di deep learning. Un approccio recente per superare questa problematica è chiamato transfer learning. Si basa sul mimare la capacità del cervello umano di affrontare nuovi problemi applicando le conoscenze acquisite per risolvere problemi simili in contesti diversi. In questo lavoro, proponiamo un approccio al beat tracking basato sul transfer learning. In particolare un RNN come rete di partenza addestrata alla musica popolare, e ne trasferiamo la conoscenza acquisita in questa fase preliminare per tracciare la musica folk. Inoltre, testiamo se i modelli risultanti sono in grado di gestire musica molto variabile. Per valutare l’efficacia del nostro approccio, abbiamo raccolto e annotato manualmente un set di dati di musica folk greca.