Imagined Speech (IS)—the silent production of words without overt articulation—offers a promising avenue for Brain–Computer Interfaces (BCIs), with potential applications in assistive communication for individuals with severe motor impairments. Electroencephalography (EEG) provides a non-invasive, portable, and temporally precise modality for IS decoding, but challenges remain due to high noise, inter-subject variability, and the reliance of most studies on small, closed vocabularies with artificial markers. This thesis project introduces a new dataset of imagined Italian words, acquired under realistic, marker-free conditions, and investigates two complementary decoding pipelines. The first benchmarks classification models combining handcrafted EEG features (spectral power, Hjorth parameters, autoregressive coefficients, fractal dimension) with representations extracted from Mantis, a time-series foundation model applied to imagined speech decoding for the first time. The second explores semantic alignment by projecting EEG features into linguistic embedding spaces (Word2Vec), enabling movement toward openvocabulary decoding. Experiments with convolutional neural networks demonstrated that both handcrafted and foundation-model-derived features achieve performance consistently above random chance, with cross-subject pre-training and fine-tuning improving robustness. Hybrid ensembles leveraging both feature types further enhanced stability. Semantic alignment results, though affected by subject variability, confirmed that EEG signals encode traces of semantic similarity beyond simple classification. Overall, this work establishes the feasibility of imagined speech decoding under realistic conditions, introduces foundation models as valuable tools in this domain, and provides benchmarks for future studies. The findings highlight both the promise and the limitations of current approaches.
L’Imagined Speech (IS)—la produzione silenziosa di parole senza articolazione esplicita— rappresenta una prospettiva promettente per le Brain–Computer Interfaces (BCI), con potenziali applicazioni nella comunicazione assistita per individui con gravi disabilità motorie. L’elettroencefalografia (EEG) costituisce una modalità non invasiva, portatile e con elevata risoluzione temporale per il decoding dell’IS, ma restano aperte diverse sfide dovute all’elevato rumore, all’eterogeneità inter-soggetto e alla dipendenza della maggior parte degli studi da vocabolari ridotti, chiusi e con marker artificiali. Questa tesi introduce un nuovo dataset di parole italiane immaginate, acquisite in condizioni realistiche e prive di marker, e indaga due pipeline di decoding complementari. La prima propone il benchmarking di modelli di classificazione che combinano feature EEG classiche (potenza spettrale, parametri di Hjorth, coefficienti autoregressivi, dimensione frattale) con rappresentazioni estratte da Mantis, un foundation model per serie temporali applicato per la prima volta al decoding dell’Imagined Speech. La seconda esplora l’allineamento semantico proiettando le feature EEG in spazi di embedding linguistici (Word2Vec), aprendo la strada verso un decoding a vocabolario aperto. Gli esperimenti con reti neurali convoluzionali hanno dimostrato che sia le feature classiche sia quelle derivate dal foundation model ottengono prestazioni costantemente superiori al caso casuale, con pre-training e fine-tuning cross-soggetto che ne migliorano la robustezza. Ensemble ibridi che integrano entrambi i tipi di feature hanno ulteriormente aumentato la stabilità. I risultati sull’allineamento semantico, seppur influenzati dalla variabilità inter-soggetto, hanno confermato che i segnali EEG contengono tracce di similarità semantica oltre la semplice classificazione. Nel complesso, questo lavoro dimostra la fattibilità del decoding dell’Imagined Speech in condizioni realistiche, introduce i foundation model come strumenti di valore in questo dominio e fornisce benchmark utili per studi futuri. I risultati mettono in luce sia le potenzialità sia le attuali limitazioni degli approcci esistenti.
Hybrid models and semantic alignment for imagined speech decoding from EEG
Cacciamani, Matteo
2024/2025
Abstract
Imagined Speech (IS)—the silent production of words without overt articulation—offers a promising avenue for Brain–Computer Interfaces (BCIs), with potential applications in assistive communication for individuals with severe motor impairments. Electroencephalography (EEG) provides a non-invasive, portable, and temporally precise modality for IS decoding, but challenges remain due to high noise, inter-subject variability, and the reliance of most studies on small, closed vocabularies with artificial markers. This thesis project introduces a new dataset of imagined Italian words, acquired under realistic, marker-free conditions, and investigates two complementary decoding pipelines. The first benchmarks classification models combining handcrafted EEG features (spectral power, Hjorth parameters, autoregressive coefficients, fractal dimension) with representations extracted from Mantis, a time-series foundation model applied to imagined speech decoding for the first time. The second explores semantic alignment by projecting EEG features into linguistic embedding spaces (Word2Vec), enabling movement toward openvocabulary decoding. Experiments with convolutional neural networks demonstrated that both handcrafted and foundation-model-derived features achieve performance consistently above random chance, with cross-subject pre-training and fine-tuning improving robustness. Hybrid ensembles leveraging both feature types further enhanced stability. Semantic alignment results, though affected by subject variability, confirmed that EEG signals encode traces of semantic similarity beyond simple classification. Overall, this work establishes the feasibility of imagined speech decoding under realistic conditions, introduces foundation models as valuable tools in this domain, and provides benchmarks for future studies. The findings highlight both the promise and the limitations of current approaches.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_10_Cacciamani_Tesi.pdf
solo utenti autorizzati a partire dal 30/09/2026
Descrizione: Testo della tesi
Dimensione
3.31 MB
Formato
Adobe PDF
|
3.31 MB | Adobe PDF | Visualizza/Apri |
|
2025_10_Cacciamani_Executive_Summary.pdf
solo utenti autorizzati a partire dal 30/09/2026
Descrizione: Executive summary
Dimensione
495.97 kB
Formato
Adobe PDF
|
495.97 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/243297