Error-related Potentials Classification in Brain-Computer Interfaces: Validation of a novel LIME Framework for signal explainability

The increasing focus on the study of the EEG signal has led to numerous advances and to the development of new technologies in the field of neuroscience. Notable among these are Brain-Computer Interfaces, which are used as a support system for people with physical disabilities. This thesis focuses on the introduction of a correction system for these interfaces based on the recognition of error-related potentials (ErrP), generated in the subject's EEG as a result of discrepancies between expected and observed outcomes. The focus of this thesis is on the development of a new framework of LIME, a well-known library for the explainability of deep learning models, for the extraction of meaningful features for the construction of a much simpler and more transparent machine learning model. In doing so, we aim to fill the hole currently left by the library regarding the handling of signals that develop over time. A dataset containing the EEG signal recording of 6 subjects during interaction with a BCI was used. Following an initial processing, a process known as subspace regularization was first applied to the extracted EEG epochs to isolate the sources of the potentials and then the obtained instances, after being standardized, were fed to a deep learning model known as EEGNet. The problem of class imbalance was addressed by a combination of oversampling the minority class through SMOTE and undersampling the majority class. In response to the need for transparent and interpretable models in the healthcare field, an explainability process was then applied to this model. The goal was to extract meaningful features to train a much simpler and interpretable model. For this purpose, 3 types of features were considered: typical features that characterize the waveforms of an ErrP, frequency features, and features that dominated model prediction. The latter were extracted through the application of the LIME library. The feature extraction process and the comparison of the results were performed with three different versions of LIME: the original framework, our framework, and an intermediate version between the two. From the analysis of the results, we obtained a confirmation of the validity of our framework, as the final model obtained with the features extracted from it is the one characterized by the best performance and the highest utility gain. Moreover, comparing the performance of the model with the state of the art, the results we obtained appear to be in line with those achieved by far more complex models.

L'attenzione sempre crescente rivolta allo studio del segnale EEG ha condotto a numerosi avanzamenti e allo sviluppo di nuove tecnologie nel campo dello neuroscienze. Tra queste sono degne di nota le interfacce BCI, utilizzate come sistema di supporto per persone con disabilità fisiche. Questa tesi si concentra sull'introduzione di un sistema di correzione di queste interfacce basato sul riconoscimento dei potenziali d'errore, generati nell'EEG del soggetto in seguito a discrepanze tra l'outcome atteso e quello restituito dall'interfaccia. L'obiettivo di questa tesi si concentra sullo sviluppo di un nuovo framework di LIME, nota libreria per l'explainability di modelli di deep learning, per l'estrazione di features significative per la costruzione di un modello di machine learning molto più semplice e trasparente. In questo modo ci poniamo come scopo quello di colmare il vuoto attualmente lasciato dalla libreria per quanto riguarda la gestione dei segnali temporali. È stato utilizzato un set di dati contenente la registrazione del segnale EEG di 6 soggetti durante l'interazione con un'interfaccia BCI. Dopo un'elaborazione iniziale, è stato applicato un processo noto come subspace regularization alle epoche EEG estratte per isolare le fonti dei potenziali, e le istanze ottenute, dopo essere state standardizzate, sono state passate a un modello di Deep Learning noto come EEGNet. Il problema dello sbilanciamento delle classi è stato affrontato con una combinazione di sovracampionamento della classe minoritaria tramite SMOTE e sottocampionamento della classe maggioritaria. In risposta all'esigenza di modelli trasparenti e interpretabili nel settore sanitario, è stato applicato a questo modello un processo di explainability. L'obiettivo era estrarre caratteristiche significative per addestrare un modello molto più semplice e interpretabile. A tal fine, sono stati considerati 3 tipi di features: features tipiche che caratterizzano le forme d'onda di un ErrP, features nel dominio della frequenza e le features che avevano determinato le previsioni del modello. Queste ultime sono state estratte attraverso l'applicazione della libreria LIME. L'estrazione delle features ed il confronto dei risultati ottenuti sono stati effettuati con tre diverse versioni di LIME: il framework originale, il nostro framework e una versione intermedia tra i due. Dall'analisi dei risultati abbiamo ottenuto una conferma della validità del nostro framework, in quanto il modello finale ottenuto con le caratteristiche da esso estratte è quello caratterizzato dalle migliori prestazioni e dal più alto utility gain. Inoltre, confrontando le prestazioni del modello con lo stato dell'arte, i risultati ottenuti risultano in linea con quelli raggiunti da modelli molto più complessi.