Deaf and hard-of-hearing (DHH) people experience barriers to face-to-face communication. The barriers are words, timing, tone cues, and safety info. This thesis reports an Artificial Intelligence (AI)-supported Augmented Reality (AR) assistive prototype for bi-directional communication. It is an augmentation of key cues on a see-through head-mounted display HMD to allow people to maintain the uninhibited appearance of looking at others. It comprises a component for (1) speech-to-text “live” communication captions and very-stylish sense/affect labels,(2) keywords brief/gist cues on information rich utterances, (3) environmental sound alerts (timely info) — prove-of-concept demo on some sound categories, and, (4) sign-to-speech mapping a preset gesture vocabulary with speech output. It is a distributed computing system. Computation is offloaded from user to “Raspberry Pi 5” (Hub), for model playback, and “T-Glass”, the module, for the AR interface. The design focus was on stability of real-time processing in the wearable display and its constraints. The study assessed usability 5-point Likert scale post-task questionnaire rating and objective performance of the sign-to-speech task. The results confirmed its Safety & Trust score (M = 4.33) was high (4+ rating) and perceived timeliness was adequate (M= 4.00 ). Usable accuracy was fair (M > 3.00 ). People issue concerns were wearability (M 2.33). The sign-to-speech task at user end was achieved a accuracy of 86.67 % and time to complete mean (3.3 s). The results confirmed Proof-of-Feasibility, but the design could be improved with increased robustness and user interaction taken account usability. It informs an AR Interactivity-centered architecture multimodal accessibility solution concept. It informs future work in wearability, uncertainty aware presentation and Field trials with DHH studying participants.
Le persone sorde e con ipoacusia (Deaf and Hard-of-Hearing, DHH) incontrano barriere nella comunicazione faccia a faccia. Tali barriere riguardano la comprensione delle parole, la temporizzazione, gli indizi di tono/intonazione e le informazioni legate alla sicurezza. Questa tesi presenta un prototipo assistivo basato su Intelligenza Artificiale (AI) e Realtà Aumentata (AR) per la comunicazione bidirezionale. Il sistema arricchisce, su un display a testa trasparente (head-mounted display, HMD), alcuni segnali chiave per consentire alle persone di mantenere un contatto visivo naturale con l’interlocutore. Il prototipo include: (1) sottotitoli di comunicazione “in tempo reale” tramite speech-to-text, con etichette leggere di indizi paralinguistici (ad es. tono/affetto); (2) brevi indizi a parole chiave (keyword brief / gist cues) per enunciati ricchi di informazione; (3) avvisi di suoni ambientali (informazioni tempestive) — con una dimostrazione proof-of-concept limitata ad alcune categorie sonore; e (4) un modulo sign-to-speech che mappa un vocabolario predefinito di gesti in output vocale. Il sistema adotta un’architettura di calcolo distribuita. Il carico computazionale viene spostato dal dispositivo indossabile a un Raspberry Pi 5 (hub) per l’esecuzione dei modelli, mentre T-Glass gestisce l’interfaccia AR. Il design si è concentrato sulla stabilità dell’elaborazione in tempo reale entro i vincoli del display indossabile.Lo studio ha valutato l’usabilità mediante un questionario post-task su scala Likert a 5 punti e ha misurato in modo oggettivo le prestazioni del compito sign-to-speech. I risultati mostrano un punteggio elevato per Safety & Trust (M = 4,33) e una tempestività percepita adeguata (M = 4,00). L’accuratezza utilizzabile è risultata discreta (M > 3,00). La criticità principale riguarda l’indossabilità/comfort (M = 2,33). Nel compito sign-to-speech, gli utenti hanno raggiunto un’accuratezza dell’86,67% con un tempo medio di completamento di 3,3 s. Nel complesso, i risultati supportano una prova di fattibilità (proof-of-feasibility); tuttavia, il design può essere migliorato aumentando la robustezza e considerando maggiormente l’interazione utente in relazione all’usabilità. Il lavoro propone un concetto di soluzione multimodale per l’accessibilità basato su un’architettura AR centrata sull’interazione. Indica inoltre direzioni future su indossabilità, presentazione consapevole dell’incertezza e sperimentazioni sul campo con partecipanti DHH.
Design and evaluation of multimodal AI-enhanced AR glasses for two-way communication in deaf and hard-of-hearing contexts
WU, JUNXI
2024/2025
Abstract
Deaf and hard-of-hearing (DHH) people experience barriers to face-to-face communication. The barriers are words, timing, tone cues, and safety info. This thesis reports an Artificial Intelligence (AI)-supported Augmented Reality (AR) assistive prototype for bi-directional communication. It is an augmentation of key cues on a see-through head-mounted display HMD to allow people to maintain the uninhibited appearance of looking at others. It comprises a component for (1) speech-to-text “live” communication captions and very-stylish sense/affect labels,(2) keywords brief/gist cues on information rich utterances, (3) environmental sound alerts (timely info) — prove-of-concept demo on some sound categories, and, (4) sign-to-speech mapping a preset gesture vocabulary with speech output. It is a distributed computing system. Computation is offloaded from user to “Raspberry Pi 5” (Hub), for model playback, and “T-Glass”, the module, for the AR interface. The design focus was on stability of real-time processing in the wearable display and its constraints. The study assessed usability 5-point Likert scale post-task questionnaire rating and objective performance of the sign-to-speech task. The results confirmed its Safety & Trust score (M = 4.33) was high (4+ rating) and perceived timeliness was adequate (M= 4.00 ). Usable accuracy was fair (M > 3.00 ). People issue concerns were wearability (M 2.33). The sign-to-speech task at user end was achieved a accuracy of 86.67 % and time to complete mean (3.3 s). The results confirmed Proof-of-Feasibility, but the design could be improved with increased robustness and user interaction taken account usability. It informs an AR Interactivity-centered architecture multimodal accessibility solution concept. It informs future work in wearability, uncertainty aware presentation and Field trials with DHH studying participants.| File | Dimensione | Formato | |
|---|---|---|---|
|
Design and Evaluation of Multimodal AI-Enhanced AR Glasses for Two-Way Communication in Deaf and Hard-of-Hearing Contexts.pdf
accessibile in internet per tutti
Dimensione
4.51 MB
Formato
Adobe PDF
|
4.51 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/252866