The present thesis aims to develop a Speech Emotion Recognition (SER) process focused on analyzing the acoustic features of vocal recordings where subjects express specific emotions. The ultimate goal is to recognize whether the emotion has indeed been conveyed through supervised learning algorithms. The collection of vocal data is facilitated by implementing a Serious Game that combines playful and educational elements, designed to adapt to various age groups and distributed on Android devices. Emotional expression occurs through emotional storytelling, allowing the individual to internally and empathetically analyze their experiences and subsequently express the emotion through a personal narrative. The use of acoustic feature extractors has resulted in three distinct datasets, obtained from the combination of feature sets derived from various extraction algorithms and their cross-validation. In the subsequent classification phase, four supervised learning models were employed: support vector classifier, decision tree classifier, random forest classifier, and XGBoost classifier. Feedback on the application's usability, assessed through the System Usability Scale (SUS) questionnaire, yielded significant results, validating both the tool and the emotional expression method. Classification results are encouraging, although they suggest the need for further stimulation of the subject's emotional state before expressing the emotion. Specifically, focusing not only on the valence of the expressed emotion but also vividly portraying, in the subject's narratives, the arousal state, which has been found to be diminished in recollection. The analysis of valence, on the other hand, already produces observable results.
Il presente lavoro di tesi si propone di sviluppare un processo di Speech Emotion Recognition (SER) concentrato sull'analisi delle caratteristiche acustiche di registrazioni vocali, nelle quali i soggetti esprimono emozioni specifiche. L'obiettivo finale è riconoscere l'effettiva espressione delle emozioni mediante algoritmi di supervised learning. La raccolta dei dati vocali è agevolata tramite l'implementazione di un Serious Game che integra elementi ludici ed educativi, progettato per adattarsi a diverse fasce d'età e distribuito su dispositivi Android. L'espressione emotiva avviene attraverso la storytelling emozionale, un approccio che consente al soggetto di analizzare internamente ed empaticamente il proprio vissuto e successivamente esprimere l'emozione attraverso un racconto personale. L'utilizzo di estrattori di features acustiche ha generato tre dataset distinti, ottenuti dalla combinazione di insiemi di caratteristiche provenienti da vari algoritmi di estrazione e dalla loro cross-validazione. Nella fase successiva di classificazione, sono stati impiegati quattro modelli di supervised learning: support vector classifier, decision tree classifier, random forest classifier e XGBoost classifier. Il feedback sull'usabilità dell'applicazione, valutato mediante il questionario SUS (System Usability Scale), ha prodotto risultati significativi, convalidando sia lo strumento che il metodo di espressione emotiva. I risultati della classificazione sono incoraggianti, anche se indicano la necessità di stimolare ulteriormente lo stato emotivo del soggetto prima dell'espressione dell'emozione. In particolare, focalizzandosi non solo sulla valence dell'emozione espressa, ma anche rendendo più vivido, nei racconti del soggetto, lo stato di arousal, il quale si è rivelato attenuato nel ricordo. L'analisi dello stato di valence, invece, produce risultati già visualizzabili.
Speech Emotion Recognition in a Storytelling-based Serious Game
CRIPPA, MATTIA
2022/2023
Abstract
The present thesis aims to develop a Speech Emotion Recognition (SER) process focused on analyzing the acoustic features of vocal recordings where subjects express specific emotions. The ultimate goal is to recognize whether the emotion has indeed been conveyed through supervised learning algorithms. The collection of vocal data is facilitated by implementing a Serious Game that combines playful and educational elements, designed to adapt to various age groups and distributed on Android devices. Emotional expression occurs through emotional storytelling, allowing the individual to internally and empathetically analyze their experiences and subsequently express the emotion through a personal narrative. The use of acoustic feature extractors has resulted in three distinct datasets, obtained from the combination of feature sets derived from various extraction algorithms and their cross-validation. In the subsequent classification phase, four supervised learning models were employed: support vector classifier, decision tree classifier, random forest classifier, and XGBoost classifier. Feedback on the application's usability, assessed through the System Usability Scale (SUS) questionnaire, yielded significant results, validating both the tool and the emotional expression method. Classification results are encouraging, although they suggest the need for further stimulation of the subject's emotional state before expressing the emotion. Specifically, focusing not only on the valence of the expressed emotion but also vividly portraying, in the subject's narratives, the arousal state, which has been found to be diminished in recollection. The analysis of valence, on the other hand, already produces observable results.File | Dimensione | Formato | |
---|---|---|---|
2023_12_Crippa_Thesis_01.pdf
accessibile in internet per tutti
Descrizione: Mattia Crippa Master Thesis
Dimensione
1.9 MB
Formato
Adobe PDF
|
1.9 MB | Adobe PDF | Visualizza/Apri |
2023_12_Crippa_Executive_Summary_02.pdf
accessibile in internet per tutti
Descrizione: Mattia Crippa Executive Summary
Dimensione
812.46 kB
Formato
Adobe PDF
|
812.46 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/214541