The field of the virtual auditory display is in continuous growing and the personalization of Head-Related Transfer Function (HRTF) is of great importance in this field as it is essential for achieving immersive and realistic audio experiences given their highly individual nature. Traditional methods of obtaining accurate HRTFs involve cumbersome acoustic measurements procedures in specialized environments, while generic HRTFs can be used, they often result in poor auditory experiences. In this study, we investigate the HRTF personalization by means of the use of the Denoising Diffusion Probabilistic Models (DDPMs), known for its outstanding capability of generating quality samples from noise. Our approach involves generating personalized HRTFs from their time-domain representation, the Head-Related Impulse Response (HRIR), combined with sound source spatial information and anthropometric features of the subject. We inject noise into HRIRs following a predetermined schedule and train a U-Net network to denoise the HRIRs based on conditional information. For the validation of our method, we followed the leave-one-out cross-validation procedure, that allowed us to leverage the existing HRTF data, while testing our method in a number of subjects. As evaluation measures, we adopted the objective metrics Mean Average Error (MAE) for the HRIR onset and Interaural Time Difference (ITD) errors, the Normalized Mean Squared Error (NMSE) and the Log-Spectral Distortion (LSD) to obtain a measure of the spectral similarity between our predicted HRTF and the actual values. The results resemblance in time and frequency domain the ground truth HRTF and achieve a promising performance in terms of the MAE and LSD when compare to the state-of-the-art methods and the Boundary Element Method simulation provided by the database. A further discussion of the implementation results identify limitations and potential improvements for future research in this domain.
Il campo del virtual auditory display è in continua crescita e la personalizzazione della Head-Related Transfer Function (HRTF) è di grande importanza in questo ambito, essendo essenziale per ottenere esperienze audio immersive e realistiche data la loro natura altamente individuale. I metodi tradizionali per ottenere HRTF accurate coinvolgono procedure di misurazione acustica ingombranti in ambienti specializzati; mentre le HRTF generiche possono essere utilizzate, spesso risultano in esperienze auditive di scarsa qualità. In questo studio, indaghiamo la personalizzazione delle HRTF mediante l'uso dei Denoising Diffusion Probabilistic Models (DDPM), noti per la loro straordinaria capacità di generare campioni di qualità dal rumore. Il nostro approccio prevede la generazione di HRTF personalizzate dalla loro rappresentazione nel dominio del tempo, la Head-Related Impulse Response (HRIR), combinata con informazioni spaziali sulla sorgente sonora e caratteristiche antropometriche del soggetto. Iniettiamo rumore nelle HRIR seguendo una pianificazione predeterminata e addestriamo una rete U-Net per il denoising delle HRIR basandosi su informazioni condizionali. Per la validazione del nostro metodo, abbiamo seguito la procedura di Leave-One-Out Cross-Validation, che ci ha permesso di sfruttare i dati HRTF esistenti, testando il nostro metodo su un numero di soggetti. Come misure di valutazione, abbiamo adottato le metriche oggettive Mean Average Error (MAE) per l'onset delle HRIR e gli errori di Interaural Time Difference (ITD), il Normalized Mean Squared Error (NMSE) e la Log-Spectral Distortion (LSD) per ottenere una misura della somiglianza spettrale tra la nostra HRTF predetta e i valori reali. I risultati rispecchiano nel dominio del tempo e della frequenza la HRTF di riferimento e ottengono una performance promettente in termini di MAE e LSD rispetto ai metodi all'avanguardia e alla simulazione Boundary Element Method fornita dal database. Una discussione ulteriore sui risultati dell'implementazione identifica limitazioni e potenziali miglioramenti per future ricerche in questo campo.
HRTF personalization based on denoising diffusion models
Albarracín Sánchez, Juan Camilo
2023/2024
Abstract
The field of the virtual auditory display is in continuous growing and the personalization of Head-Related Transfer Function (HRTF) is of great importance in this field as it is essential for achieving immersive and realistic audio experiences given their highly individual nature. Traditional methods of obtaining accurate HRTFs involve cumbersome acoustic measurements procedures in specialized environments, while generic HRTFs can be used, they often result in poor auditory experiences. In this study, we investigate the HRTF personalization by means of the use of the Denoising Diffusion Probabilistic Models (DDPMs), known for its outstanding capability of generating quality samples from noise. Our approach involves generating personalized HRTFs from their time-domain representation, the Head-Related Impulse Response (HRIR), combined with sound source spatial information and anthropometric features of the subject. We inject noise into HRIRs following a predetermined schedule and train a U-Net network to denoise the HRIRs based on conditional information. For the validation of our method, we followed the leave-one-out cross-validation procedure, that allowed us to leverage the existing HRTF data, while testing our method in a number of subjects. As evaluation measures, we adopted the objective metrics Mean Average Error (MAE) for the HRIR onset and Interaural Time Difference (ITD) errors, the Normalized Mean Squared Error (NMSE) and the Log-Spectral Distortion (LSD) to obtain a measure of the spectral similarity between our predicted HRTF and the actual values. The results resemblance in time and frequency domain the ground truth HRTF and achieve a promising performance in terms of the MAE and LSD when compare to the state-of-the-art methods and the Boundary Element Method simulation provided by the database. A further discussion of the implementation results identify limitations and potential improvements for future research in this domain.File | Dimensione | Formato | |
---|---|---|---|
2024_07_Albarracin_thesis.pdf
non accessibile
Descrizione: MSc. Thesis - Article format
Dimensione
2.92 MB
Formato
Adobe PDF
|
2.92 MB | Adobe PDF | Visualizza/Apri |
2024_07_Albarracin_executive summary.pdf
non accessibile
Descrizione: MSc. Thesis - Executive Summary
Dimensione
928.53 kB
Formato
Adobe PDF
|
928.53 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/223812