Heartfelt words. Automated report generation for ECG signals

The ECG is a commonly used tool to evaluate the heart. It consists in placing electrode on a patient and measuring electrical activity in the heart that takes the form of a signal. Changes in these signals can be signs of many heart-related conditions. The task of using machine learning to classify arrhythmias in these signals has been extensively explored in literature. In this thesis we present a different approach: instead of focusing on classification, propose use language models, in particular a GPT-2 model, to automatically generate the textual report written by a physician that typically accompanies the ECG signal. We propose a model that employs and encoder-decoder architecture where the encoder encodes the input ECG signal and feeds it to the decoder that produces the final text report. Moreover, we propose a series of experiments to find which is the best model architecture for generating captions and find a model composed of a multi-label ResNet encoder and the GPT-2 decoder that manages to achieve a test BLEU-1 score of 0.442. We compare our model to the state-of-the-art and find that using a language model decoder improves performance. Lastly we attempt to create a multilingual model to take full advantage of our dataset composed of both German and English captions. The approach used in this thesis allows to encode ECG signals as tokens that are able to be processed by language models, making use of these incredibly advanced networks and allowing for a much more flexible process that can be easily expanded and updated.

L’elettrocardiogramma (ECG) è uno strumento comunemente usato per valutare lo stato cardiaco di un paziente. Consiste nel posizionare elettrodi sul paziente e misurare l’attività elettrica del cuore che prende la froma di un segnale temporale. Cambiamenti a questo segnale possono essere simbolo di molte patologie cardiache. Usare il machine learning per automatizzare l’individuazione e la classificazione di aritmie cardiache è stato studiato a fondo nella letteratura scentifica. In questa tesi, invece di concentrarci su questo aspetto di classificazione, proponiamo un modello basato su language models, in particlare GPT-2, per generare automati- camente i report testuali che spesso accompagnano gli ECG. Proponiamo un modello con un’architettura composta da encoder e decoder, dove il segnale viene dato in input all’encoder che lo processa e lo passa al decoder che a sua volta si occupa di genereare il report finale. Proponiamo inoltre una serie di esperimenti per trovare la struttura migliore per generare report utili. Il modello migliore che creiamo, dopo aver eseguito tali esperimenti, è un modello composto da un encoder di tipo ResNet multi-label e il decoder GPT-2 che raggiunge uno score di 0.442 sul test set con la metrica di valutazione BLEU-1. Confrontiamo il nostro modello con lo stato-dell-arte e troviamo che l’utilizzo dei language model migliora le performance. Infine proviamo a creare una versione mul- tilingua del modello per sfruttare a pieno il nostro dataset composto da testi in tedesco e inglese. L’approccio usato in questa tesi permette di codificare i segnali ECG come token che possono essere processati dai language model, permettendo di sfruttare queste reti incredibilmente avanzate e avere un processo molto più flessibile e anche eventualmente espandibile con altre informazioni.