Modeling harmonic and rhythmic complexity for applications of music information retrieval

The word "complex'' is often used for describing something counterintuitive and unpredictable. In the context of art, artistic languages can be used in more or less complex ways to create and maintain interest. But what exactly defines complexity? It can be argued that obeying or violating the common patterns and expectations will affect the complexity of a work of art. Music is no exception, with its many languages that unfold through time, such as harmony, rhythm, melody, orchestration, timbre. In this work we focus specifically on harmony and rhythm and analyze some of the characteristics that influence complexity. In doing so, we first study the relevant music descriptors, such as chords, keys and beats, proposing new models to automatically extract these properties from the audio signal. Successively, we propose data-driven and model-based methods for estimating complexity from symbolic representations of harmony and rhythm. Signal processing, machine learning techniques and musical theory are used throughout this work to achieve such goals. The main contributions of the thesis are subdivided into two parts: the first addresses harmony and contains our works on chord and key extraction as well as the estimation of harmonic complexity; the second part is devoted to rhythm analysis and includes our works on the estimation of beat instants and rhythmic complexity. As far as harmony is concerned, we begin by describing our chord and key recognition system. We focus on one particular aspect of Western pop and rock music that is arguably overlooked in the related literature: the diatonic modes. Two modes in particular, called Dorian and Mixolydian, complement and connect the two well-known opposite poles, the major (Ionian) and minor (Aeolian) modes. We incorporate these modes and provide a novel, musically meaningful parameterization of a known dynamic Bayesian Network approach. These variations increase the accuracy of the system, as shown by the results. We then move forward and analyze how the expectations formed when listening to tonal chord sequences can influence perceived harmonic complexity. We model expectations by training three different language models, i.e. prediction by partial matching, hidden Markov model and recurrent neural networks, on a novel large dataset containing half a million annotated chord sequences. We then train a compound model and use it to generate a set of chord sequences that we included a listening test. Results show a strong relation between negative log probability of the chord sequences, given by our language model, and the subjects' complexity ratings. As far as the rhythmic information is concerned, we focus on beats, which are often localized given an onset detection function extracted from audio signal, as well as the tempo path that is estimated from it. The tracking strategy required for estimating the sequence of beat instants from such descriptors is usually based on dynamic programming algorithms. We propose a novel strategy based on an efficient generation and joint steering of multiple simple trackers. Although the method performs an heuristic search as opposed to the full search of the dynamic programming approach, this solution is shown to lead to improved computational efficiency. The method is also compared with a broader set of state-of-the-art solutions, in order to offer a more general analysis. Finally, we review some of the models for estimating rhythmic complexity from symbolic representations. We focus on the class of rhythms with unusual time signature, which are common in some Western genres and non-Western musical cultures. We propose a novel model, generalizing concepts as beat induction, almost maximal evenness and weighted metrical hierarchy to such class of rhythms. In order to validate our model, we conducted a test where subjects were asked to tap along with the rhythms, while their performance were recorded and measured. Early results from the performance test show that our model estimates rhythmic complexity more accurately than the other models found in the literature. The overall contribution of the thesis is composed of models for the estimation of musically relevant information, such as chord, key and beat, from the audio signal; as well as models for the estimation of perceived harmonic and rhythmic complexity, given the symbolic representation of related musical elements, such as chord and onset sequences. Accessory contributions include the release of the annotations of chords, keys and beats created for our experiments. The annotations have received support from other researchers and have been used for international contests such as the Music Information Retrieval Evaluation eXchange (MIREX) contest. Also, the source code of some of our models have been shared with related publications or through public repositories.

La parola "complessità" è spesso usata per descrivere qualcosa di contro-intuitivo e inatteso. Nel contesto dell'arte, i linguaggi artistici possono essere usati in maniera più o meno complessa per creare e mantenere l'interesse. Tuttavia, che cosa esattamente definisce la complessità? Si può pensare che la conferma o la violazione degli schemi e delle aspettative comuni influiscano sulla complessità di un' opera d'arte. La musica non fa eccezione, con i suoi molti linguaggi che si spiegano nel tempo, come armonia, ritmo, melodia, orchestrazione e timbro. In questo lavoro ci concentriamo su armonia e ritmo e analizziamo alcune caratteristiche che influiscono sulla complessità. Inizialmente studiamo i relativi descrittori musicali, ovvero gli accordi, le tonalità e i beat, proponendo nuovi modelli per estrarre automaticamente queste proprietà dal segnale audio. Successivamente, proponiamo metodi data-driven e model-based per stimare la complessità dalla rappresentazione simbolica di armonia e ritmo. Tecniche di elaborazione dei segnali, machine learning e la teoria musicale saranno usati lungo tutto il lavoro per ottenere tali obiettivi. I contributi principali della tesi sono divisi in due parti: la prima riguarda l'armonia e contiene i nostri lavori sull'estrazione di accordi e tonalità, così come la stima della complessità armonica. La seconda parte è dedicata all'analisi ritmica e include i nostri lavori sulla stima degli istanti di beat e della complessità ritmica.

Modeling harmonic and rhythmic complexity for applications of music information retrieval

DI GIORGI, BRUNO

Abstract

The word "complex'' is often used for describing something counterintuitive and unpredictable. In the context of art, artistic languages can be used in more or less complex ways to create and maintain interest. But what exactly defines complexity? It can be argued that obeying or violating the common patterns and expectations will affect the complexity of a work of art. Music is no exception, with its many languages that unfold through time, such as harmony, rhythm, melody, orchestration, timbre. In this work we focus specifically on harmony and rhythm and analyze some of the characteristics that influence complexity. In doing so, we first study the relevant music descriptors, such as chords, keys and beats, proposing new models to automatically extract these properties from the audio signal. Successively, we propose data-driven and model-based methods for estimating complexity from symbolic representations of harmony and rhythm. Signal processing, machine learning techniques and musical theory are used throughout this work to achieve such goals. The main contributions of the thesis are subdivided into two parts: the first addresses harmony and contains our works on chord and key extraction as well as the estimation of harmonic complexity; the second part is devoted to rhythm analysis and includes our works on the estimation of beat instants and rhythmic complexity. As far as harmony is concerned, we begin by describing our chord and key recognition system. We focus on one particular aspect of Western pop and rock music that is arguably overlooked in the related literature: the diatonic modes. Two modes in particular, called Dorian and Mixolydian, complement and connect the two well-known opposite poles, the major (Ionian) and minor (Aeolian) modes. We incorporate these modes and provide a novel, musically meaningful parameterization of a known dynamic Bayesian Network approach. These variations increase the accuracy of the system, as shown by the results. We then move forward and analyze how the expectations formed when listening to tonal chord sequences can influence perceived harmonic complexity. We model expectations by training three different language models, i.e. prediction by partial matching, hidden Markov model and recurrent neural networks, on a novel large dataset containing half a million annotated chord sequences. We then train a compound model and use it to generate a set of chord sequences that we included a listening test. Results show a strong relation between negative log probability of the chord sequences, given by our language model, and the subjects' complexity ratings. As far as the rhythmic information is concerned, we focus on beats, which are often localized given an onset detection function extracted from audio signal, as well as the tempo path that is estimated from it. The tracking strategy required for estimating the sequence of beat instants from such descriptors is usually based on dynamic programming algorithms. We propose a novel strategy based on an efficient generation and joint steering of multiple simple trackers. Although the method performs an heuristic search as opposed to the full search of the dynamic programming approach, this solution is shown to lead to improved computational efficiency. The method is also compared with a broader set of state-of-the-art solutions, in order to offer a more general analysis. Finally, we review some of the models for estimating rhythmic complexity from symbolic representations. We focus on the class of rhythms with unusual time signature, which are common in some Western genres and non-Western musical cultures. We propose a novel model, generalizing concepts as beat induction, almost maximal evenness and weighted metrical hierarchy to such class of rhythms. In order to validate our model, we conducted a test where subjects were asked to tap along with the rhythms, while their performance were recorded and measured. Early results from the performance test show that our model estimates rhythmic complexity more accurately than the other models found in the literature. The overall contribution of the thesis is composed of models for the estimation of musically relevant information, such as chord, key and beat, from the audio signal; as well as models for the estimation of perceived harmonic and rhythmic complexity, given the symbolic representation of related musical elements, such as chord and onset sequences. Accessory contributions include the release of the annotations of chords, keys and beats created for our experiments. The annotations have received support from other researchers and have been used for international contests such as the Music Information Retrieval Evaluation eXchange (MIREX) contest. Also, the source code of some of our models have been shared with related publications or through public repositories.

Scheda breve

Scheda completa

	Relatore
	
				SARTI, AUGUSTO
			
	Coordinatore
	
				BONARINI, ANDREA
			
	Tutor
	
				BARESI, LUCIANO
			
	Correlatore/i
	
				ZANONI, MASSIMILIANO
			
	Data
	
				30-giu-2017
			
	Abstract in italiano
	
				La parola "complessità" è spesso usata per descrivere qualcosa di contro-intuitivo e inatteso. Nel contesto dell'arte, i linguaggi artistici possono essere usati in maniera più o meno complessa per creare e mantenere l'interesse. Tuttavia, che cosa esattamente definisce la complessità? Si può pensare che la conferma o la violazione degli schemi e delle aspettative comuni influiscano sulla complessità di un' opera d'arte. La musica non fa eccezione, con i suoi molti linguaggi che si spiegano nel tempo, come armonia, ritmo, melodia, orchestrazione e timbro.

In questo lavoro ci concentriamo su armonia e ritmo e analizziamo alcune caratteristiche che influiscono sulla complessità.
Inizialmente studiamo i relativi descrittori musicali, ovvero gli accordi, le tonalità e i beat, proponendo nuovi modelli per estrarre automaticamente queste proprietà dal segnale audio.
Successivamente, proponiamo metodi data-driven e model-based per stimare la complessità dalla rappresentazione simbolica di armonia e ritmo.
Tecniche di elaborazione dei segnali, machine learning e la teoria musicale saranno usati lungo tutto il lavoro per ottenere tali obiettivi.

I contributi principali della tesi sono divisi in due parti: la prima riguarda l'armonia e contiene i nostri lavori sull'estrazione di accordi e tonalità, così come la stima della complessità armonica. La seconda parte è dedicata all'analisi ritmica e include i nostri lavori sulla stima degli istanti di beat e della complessità ritmica.
			
	Tipo di documento
	
				Tesi di dottorato
			
	Appare nelle tipologie:
	
				Tesi di Dottorato

File allegati

File	Dimensione	Formato
thesis.pdf non accessibile Descrizione: Testo della tesi Dimensione 2.57 MB Formato Adobe PDF Visualizza/Apri	2.57 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/134429