Affective computing is the study and development of systems and devices able to recognize, interpret, process, and simulate human affects. Building effective models require data, meaning it is necessary to evoke specific emotional states in humans and record the reactions. The objective of this thesis is to test virtual reality headsets as a novel method for emotional elicitation. Existing literature supports the idea behind this choice. The research is based on both subjective experiences and insights extracted from physiological signals. Its primary expected output is the construction of efficient machine learning models, capable of correctly classifying the emotional status of a subject from bio-signals acquired while using the headset. To accomplish this purpose we considered the circumflex Russell’s model two dimensions of arousal and valence. Secondly, this study highlights the potential of advanced signal processing methods to extract meaningful features for the emotion separation task. The elicitation protocol used for the scope has been designed following existing heuristics, scientific facts of emotional theory, and feedback from a focus group. It was developed with Unity and integrated with an external software to perform signal acquisition. Electrocardiograms (ECG), blood volume pressure (BVP or PPG), galvanic skin response (GSR or EDA), and respiration were collected using a ProComp device with four channels. From these signals, several specific time series have been extracted and analyzed. The most noteworthy are The Pulse Pressure (PP) and the Pulse arrival time (PAT). To validate the emotional experiences (neutral scenes, sadness, relaxation, happiness, fear), we distributed a survey to each subject after the completion of the protocol. The most relevant and original design ideas were delivering compound stimuli similarly to a life-like scenario and combining visual and audio layers to amplify the sense of presence. To the best of our knowledge, this is the first study integrating advanced signal processing and virtual reality. Specifically, Point Process modeling of heart rate variability (HRV) and complex metrics such as Lyapunov exponents have never been used in this context. We performed a preliminary analysis of the data, which led to several new definitions to integrate information extracted from the surveys and physiological trends. We analyzed the differences in response in the tested population. The results demonstrated the presence of at least five sub-groups divided by their cardio-vascular-galvanic response. This finding entangles with the original definition of the consistency map. It is a scoring system that assigns a specific weight to each observation based on the subjective experience reported with the Self Assessment Manikin (SAM). Subjects with the highest consistencies tend to have a galvanic activation and possibly a sympathetic vascular response. People having medium responses were the sole ones showing the presence of cardiovascular sympathetic activation without any visible galvanic response. The information obtained from this analysis was then translated to ad hoc weights used for ML approaches. More meticulous investigations on the statistical separation power of each feature have been carried out using the Friedman test. The results showed EDA and PP to be the most relevant signals in this context. Despite being less performing, PAT, cardio, and respiration still had some statistical power. Point process features seemed to be not statistically significant. We designed several ML models to conduct automatic emotion recognition. The testing pipeline was defined to optimize both the feature selection method (Sequential Feature Selection, K best, Square method) and the model's hyperparameters. In the four emotion discrimination, the best model (Logistic regression) reached 61% accuracy and [0.79, 0.75, 0.65, 0.96] AUC scores using the One Versus All method (OVA or OVR). Instead, the KNN classifier has achieved 89% accuracy and 0.94 AUC for arousal. For valence detection SVM scored 72% accuracy and 0.71 AUC. From these results, point process features are shown to be useful to separate the observations despite the lower statistical validity obtained in the previous analysis steps. Signals with the highest relevance resulted to be GSR and PPG which are not invasive and easy to measure using third party wearable solutions or directly with some HMDs. The last achievement was designing and integrating a physiological dimension in the Russell Circumplex Model. The extra dimension was obtained as a linear combination of n peaks GSR, sd amp peaks GSR, and avg pp PP. The mean of each distribution looks well separated from others in the new domain. To conclude, this study demonstrates the ability of physiological signals to discriminate the emotional states of different subjects, using a fast and reliable method to select the most important features describing the ANS influence. The approach has high clinical relevance as it could be extended to estimate other emotional states (e.g., stress and pain) in a controlled virtual environment, characterizing pathological conditions such as post-traumatic stress disorder and depression.
L'affective computing è lo studio e lo sviluppo di sistemi e dispositivi in grado di riconoscere, interpretare, elaborare e simulare le emozioni umane. La costruzione di modelli efficaci richiede dati, il che significa che è necessario evocare stati emotivi specifici negli esseri umani e registrare le reazioni. L'obiettivo della tesi è testare i dispositivi per la realtà virtuale come nuovo metodo per la stimolazione emotiva. La letteratura esistente supporta l'idea alla base di questa scelta. La ricerca si basa sull'esperienza soggettiva e sui segnali fisiologici registrati. L'obbiettivo finale è la costruzione di modelli di machine learning efficienti e in grado di classificare correttamente lo stato emotivo dai bio-segnali acquisiti durante l'utilizzo del visore. Per raggiungere questo scopo abbiamo considerato il modello circonflesso di Russell a due dimensioni (eccitazione e valenza). In secondo luogo, questo studio mette in evidenza il potenziale dei metodi avanzati di elaborazione del segnale per estrarre caratteristiche significative per il compito di classificazione delle emozioni. Il protocollo emozionale utilizzato è stato progettato seguendo l'euristica esistente, i fatti scientifici della teoria emotiva e il feedback di un focus group. Lo stesso è stato sviluppato con Unity e integrato con un software esterno per eseguire l'acquisizione dei segnali. L’ elettrocardiogramma (ECG), la pressione del volume del sangue (BVP o PPG), la risposta galvanica della pelle (GSR o EDA) e la respirazione sono stati raccolti utilizzando il ProComp a quattro canali. Da questi segnali sono state estratte e analizzate diverse serie temporali specifiche. Le più degne di nota sono la Pulse pressure (PP) e il Pulse arrival time (PAT). Per convalidare le esperienze emotive (scene neutre, tristezza, rilassamento, felicità, paura), abbiamo distribuito un sondaggio a ciascun soggetto dopo il completamento del protocollo. Le idee progettuali più rilevanti e originali sono state la somministrazione di stimoli compositi simili a uno scenario realistico e la combinazione di livelli visivi e audio per amplificare il senso di presenza. Per quanto ne sappiamo, questo è il primo studio che integra l'elaborazione avanzata del segnale e la realtà virtuale. In particolare, la modellazione Point Process della variabilità cardiaca e metriche complesse come gli esponenti di Lyapunov non sono mai state utilizzate in questo contesto. Abbiamo eseguito un'analisi preliminare dei dati, che ha portato a diverse nuove definizioni per integrare informazioni estratte dai questionari e i trend presenti nei segnali. Sono state studiate le differenze di risposta nella popolazione testata. I risultati hanno dimostrato la presenza di almeno cinque sottogruppi divisi per la loro risposta cardio-vascolare-galvanica. Questa scoperta si appoggia sulla definizione originale della mappa di coerenza. Si tratta di un sistema di punteggio che assegna un peso specifico a ciascuna osservazione in base all'esperienza soggettiva riportata con il Manichino di Autovalutazione (SAM). I soggetti con le coerenze più elevate tendono ad avere un'attivazione galvanica e possibilmente una risposta vascolare simpatica. Le persone con risposte medie sono state le uniche a mostrare la presenza di attivazione simpatica cardiovascolare senza alcuna risposta galvanica visibile. Le informazioni ottenute da questa analisi sono state quindi tradotte in pesi utilizzati per gli approcci ML. Indagini più meticolose sul potere di separazione di ciascuna caratteristica sono state effettuate utilizzando uno strumento statistico noto come il test di Friedman. I risultati hanno mostrato che EDA e PP sono i segnali più rilevanti in questo contesto. Nonostante fossero meno performanti, PAT, cardio e respirazione possiedono comunque un certo potere statistico. Le caratteristiche del Point Process sembrano non essere statisticamente significative. Sono stati progettati diversi modelli ML per condurre il riconoscimento automatico delle emozioni. La pipeline di test è stata definita per ottimizzare sia il metodo di selezione delle caratteristiche (Sequential Feature Selection, K best, Square method) sia gli iperparametri del modello. Nella discriminazione delle quattro emozioni, il modello migliore (regressione logistica) ha raggiunto un'accuratezza del 61% e [0.79, 0.75, 0.65, 0.96] di punteggi AUC ottenuti con il metodo One Versus All (OVA o OVR). Invece, un modell KNN ha raggiunto un'accuratezza dell'89% e 0.94 AUC per l'eccitazione. Per il rilevamento della valenza, un modell SVM ha ottenuto un'accuratezza del 72% e 0.71 AUC. Da questi risultati, è stato dimostrato che le caratteristiche del processo puntuale potrebbero essere utili per separare le osservazioni nonostante la minore validità statistica ottenuta nelle fasi di analisi precedenti. I segnali di maggiore rilevanza sono risultati GSR e PPG non invasivi e facilmente misurabili utilizzando soluzioni indossabili di terze parti o direttamente con alcuni HMD. L'ultimo risultato è stato la progettazione e l'integrazione di una dimensione fisiologica nel “Russell Circumplex Model”. La dimensione extra è stata ottenuta come una combinazione lineare di n peaks GSR, sd amp peaks GSR, e avg pp PP. La media di ciascuna distribuzione sembra ben separata dalle altre nel nuovo dominio. Per concludere, questo studio dimostra la capacità dei segnali fisiologici di discriminare gli stati emotivi di diversi soggetti, utilizzando un metodo rapido e affidabile per selezionare le caratteristiche più importanti che descrivono l'influenza del sistema nervosos autonomo. L'approccio utlizzato ha un'elevata rilevanza clinica in quanto potrebbe essere esteso alla stima di altri stati emotivi (es. stress e dolore) caratterizzando condizioni patologiche come il disturbo da stress post-traumatico e la depressione.
A novel affective computing framework based on VR elicitation and advanced signal processing emotion recognition assessment
VALDES REY, ALBERTO
2021/2022
Abstract
Affective computing is the study and development of systems and devices able to recognize, interpret, process, and simulate human affects. Building effective models require data, meaning it is necessary to evoke specific emotional states in humans and record the reactions. The objective of this thesis is to test virtual reality headsets as a novel method for emotional elicitation. Existing literature supports the idea behind this choice. The research is based on both subjective experiences and insights extracted from physiological signals. Its primary expected output is the construction of efficient machine learning models, capable of correctly classifying the emotional status of a subject from bio-signals acquired while using the headset. To accomplish this purpose we considered the circumflex Russell’s model two dimensions of arousal and valence. Secondly, this study highlights the potential of advanced signal processing methods to extract meaningful features for the emotion separation task. The elicitation protocol used for the scope has been designed following existing heuristics, scientific facts of emotional theory, and feedback from a focus group. It was developed with Unity and integrated with an external software to perform signal acquisition. Electrocardiograms (ECG), blood volume pressure (BVP or PPG), galvanic skin response (GSR or EDA), and respiration were collected using a ProComp device with four channels. From these signals, several specific time series have been extracted and analyzed. The most noteworthy are The Pulse Pressure (PP) and the Pulse arrival time (PAT). To validate the emotional experiences (neutral scenes, sadness, relaxation, happiness, fear), we distributed a survey to each subject after the completion of the protocol. The most relevant and original design ideas were delivering compound stimuli similarly to a life-like scenario and combining visual and audio layers to amplify the sense of presence. To the best of our knowledge, this is the first study integrating advanced signal processing and virtual reality. Specifically, Point Process modeling of heart rate variability (HRV) and complex metrics such as Lyapunov exponents have never been used in this context. We performed a preliminary analysis of the data, which led to several new definitions to integrate information extracted from the surveys and physiological trends. We analyzed the differences in response in the tested population. The results demonstrated the presence of at least five sub-groups divided by their cardio-vascular-galvanic response. This finding entangles with the original definition of the consistency map. It is a scoring system that assigns a specific weight to each observation based on the subjective experience reported with the Self Assessment Manikin (SAM). Subjects with the highest consistencies tend to have a galvanic activation and possibly a sympathetic vascular response. People having medium responses were the sole ones showing the presence of cardiovascular sympathetic activation without any visible galvanic response. The information obtained from this analysis was then translated to ad hoc weights used for ML approaches. More meticulous investigations on the statistical separation power of each feature have been carried out using the Friedman test. The results showed EDA and PP to be the most relevant signals in this context. Despite being less performing, PAT, cardio, and respiration still had some statistical power. Point process features seemed to be not statistically significant. We designed several ML models to conduct automatic emotion recognition. The testing pipeline was defined to optimize both the feature selection method (Sequential Feature Selection, K best, Square method) and the model's hyperparameters. In the four emotion discrimination, the best model (Logistic regression) reached 61% accuracy and [0.79, 0.75, 0.65, 0.96] AUC scores using the One Versus All method (OVA or OVR). Instead, the KNN classifier has achieved 89% accuracy and 0.94 AUC for arousal. For valence detection SVM scored 72% accuracy and 0.71 AUC. From these results, point process features are shown to be useful to separate the observations despite the lower statistical validity obtained in the previous analysis steps. Signals with the highest relevance resulted to be GSR and PPG which are not invasive and easy to measure using third party wearable solutions or directly with some HMDs. The last achievement was designing and integrating a physiological dimension in the Russell Circumplex Model. The extra dimension was obtained as a linear combination of n peaks GSR, sd amp peaks GSR, and avg pp PP. The mean of each distribution looks well separated from others in the new domain. To conclude, this study demonstrates the ability of physiological signals to discriminate the emotional states of different subjects, using a fast and reliable method to select the most important features describing the ANS influence. The approach has high clinical relevance as it could be extended to estimate other emotional states (e.g., stress and pain) in a controlled virtual environment, characterizing pathological conditions such as post-traumatic stress disorder and depression.File | Dimensione | Formato | |
---|---|---|---|
A novel affective computing framework based on VR elicitation and advanced signal processing emotion recognition assessment.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Documento di tesi
Dimensione
60.25 MB
Formato
Adobe PDF
|
60.25 MB | Adobe PDF | Visualizza/Apri |
Executive_Summary_AVR.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Executive summary tesi
Dimensione
480.34 kB
Formato
Adobe PDF
|
480.34 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/192353