Towards joint music source separation and deep drums demixing

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

Music Source Separation (MSS) algorithms are widely used, allowing artists and researchers alike to separate mixed tracks into stems, typically four, including vocals, bass, drums and others. Drums Demixing (DDX) is a subfield of MSS aiming to obtain isolated drum stems from a drums mixture. As of today, in order to extract both instrumental and drum stems from a song, it is necessary to first apply a MSS algorithm and, subsequently, a DDX model to the drums stem isolated by the former. The goal of this thesis is to evaluate the performance of a single model tasked with jointly performing MSS and DDX, comparing it to current two-stage approaches. To do so, two state-of-the-art architectures were trained and tested as MSS (4-stem), DDX (5-stem), and MSS+DDX (8-stem) models. Results show that single-stage configurations are less effective than their two-stage counterparts in terms of signal-to-distortion ratio, while requiring roughly half the time for inference on the same hardware, highlighting a trade-off between performance and computational requirements.

Gli algoritmi di Music Source Separation (MSS) sono ampiamente usati, permettendo sia a musicisti che ricercatori di separare tracce mixate in stem, tipicamente quattro, fra cui voci, basso, batteria e altri. Il Drums Demixing (DDX) è un sottocampo di MSS che punta ad ottenere stem isolati delle singole parti della batteria da una traccia di sola batteria. Ad oggi, per estrarre sia le parti strumentali che di batteria da una canzone, è necessario applicare prima un algoritmo di MSS e, successivamente, un modello di DDX allo stem di batteria isolato dal primo. L’obiettivo di questa tesi è valutare la performance di un singolo modello che svolge contemporaneamente MSS e DDX, confrontandolo con gli attuali approcci a due stadi. Per far ciò, due architetture allo stato dell’arte sono state addestrate e testate come modelli di MSS (a 4 stem), DDX (a 5 stem) e MSS+DDX (a 8 stem). I risultati mostrano che le configurazioni a singolo stadio sono meno efficaci rispetto alle loro controparti a due stadi in termini di rapporto segnale/distorsione, richiedendo tuttavia circa metà del tempo per l’inferenza a parità di hardware, evidenziando un compromesso fra performance e requisiti di sistema.

Towards joint music source separation and deep drums demixing

COLOTTI, FRANCESCO

2023/2024

Abstract

Music Source Separation (MSS) algorithms are widely used, allowing artists and researchers alike to separate mixed tracks into stems, typically four, including vocals, bass, drums and others. Drums Demixing (DDX) is a subfield of MSS aiming to obtain isolated drum stems from a drums mixture. As of today, in order to extract both instrumental and drum stems from a song, it is necessary to first apply a MSS algorithm and, subsequently, a DDX model to the drums stem isolated by the former. The goal of this thesis is to evaluate the performance of a single model tasked with jointly performing MSS and DDX, comparing it to current two-stage approaches. To do so, two state-of-the-art architectures were trained and tested as MSS (4-stem), DDX (5-stem), and MSS+DDX (8-stem) models. Results show that single-stage configurations are less effective than their two-stage counterparts in terms of signal-to-distortion ratio, while requiring roughly half the time for inference on the same hardware, highlighting a trade-off between performance and computational requirements.

Scheda breve

Scheda completa

	Relatore
	
				Bernardini, Alberto
			
	Correlatore/i
	
				MEZZA, ALESSANDRO ILIC
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				9-ott-2024
			
	Anno accademico
	
				2023/2024
			
	Abstract in italiano
	
				Gli algoritmi di Music Source Separation (MSS) sono ampiamente usati, permettendo sia a musicisti che ricercatori di separare tracce mixate in stem, tipicamente quattro, fra cui voci, basso, batteria e altri. Il Drums Demixing (DDX) è un sottocampo di MSS che punta ad ottenere stem isolati delle singole parti della batteria da una traccia di sola batteria. Ad oggi, per estrarre sia le parti strumentali che di batteria da una canzone, è necessario applicare prima un algoritmo di MSS e, successivamente, un modello di DDX allo stem di batteria isolato dal primo. L’obiettivo di questa tesi è valutare la performance di un singolo modello che svolge contemporaneamente MSS e DDX, confrontandolo con gli attuali approcci a due stadi. Per far ciò, due architetture allo stato dell’arte sono state addestrate e testate come modelli di MSS (a 4 stem), DDX (a 5 stem) e MSS+DDX (a 8 stem). I risultati mostrano che le configurazioni a singolo stadio sono meno efficaci rispetto alle loro controparti a due stadi in termini di rapporto segnale/distorsione, richiedendo tuttavia circa metà del tempo per l’inferenza a parità di hardware, evidenziando un compromesso fra performance e requisiti di sistema.
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
2024_10_Colotti_tesi_01.pdf non accessibile Descrizione: tesi, formato articolo Dimensione 1.17 MB Formato Adobe PDF Visualizza/Apri	1.17 MB	Adobe PDF	Visualizza/Apri
2024_10_Colotti_executive_summary_02.pdf non accessibile Descrizione: executive summary Dimensione 512.79 kB Formato Adobe PDF Visualizza/Apri	512.79 kB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/227677