Volumetric scar segmentation using 3D deep learning architecture applied to late gadolinium enhancement cardiac MRI

Cardiovascular diseases (CVDs) are the leading cause of death worldwide: according to the World Health organization (WHO), about 17.7 million people died in 2015 for CVDs, that means 31% of all deaths worldwide. One of the most common histological features of heart failure is the presence of myocardial fibrotic scars, due to tissue-repair process. Nowadays, in order to identify and evaluate the scar, the clinical practice involves the use of medical imaging and cardiac magnetic resonance (CMR) with late Gadolinium Enhancement (LGE). CMR-LGE involves the use of gadolinium (Gd), a contrast medium injected intravenously about 10-20 minutes before the CMR is performed, so that it has already been disposed of from healthy tissue and it is lodged only in the fibrotic tissue. CMR-LGE images are segmented to identify the contours of epicardium and endocardium in each CMR-LGE slice and, subsequently, of the fibrotic scar tissue. Currently the image segmentation is done by an operator manually or semi-automatically, using n-standard deviation (n-SD) and full width at half maximum algorithms. In this work, the application of U-Net 3D, a 3D convolutional neural network with supervised learning designed to perform scar segmentation on CMR-LGE images is proposed. This network takes 3D volumes as input and processes them with corresponding 3D operations, such as 3D convolutions, 3D max pooling, and 3D up-convolutional layers. In addition, we avoid bottlenecks in the network architecture and use batch normalization for faster convergence. Regarding its architecture, in the analysis path, each layer contains two 3 × 3 × 3 convolutions each followed by a ReLU, and then a 2 × 2 × 2 max pooling with strides of two in each dimension. In the synthesis path, each layer consists of an upconvolution of 2 × 2 × 2 by strides of two in each dimension, followed by two 3 × 3 × 3 convolutions each followed by a ReLU. Shortcut connections from layers of equal resolution in the analysis path provide the essential high-resolution features to the synthesis path. In the last layer a 1×1×1 convolution reduces the number of output channels to the number of labels. The important part of the architecture is the weighted softmax loss function. Setting the weights of unlabeled pixels to zero makes it possible to learn from only the labelled ones and, hence, to generalize to the whole volume. In this work, U-Net 3D was evaluated with three experimental protocols, one protocol for detecting the contours of epicardium and endocardium, and the other two for detecting the contours of cardiac scars, without or with the information about the region of interest in which to perform this search After the evaluation, the obtained results have been compared with the results of another neural network, an E-Net, developed by Riccardo Banali in the master thesis “A Deep Learning Approach for Scar Segmentation from Late Gadolinium Enhancement Cardiac Magnetic Resonance Images". Protocol 1 (P1) used the network taking CMR-LGE volumes as input, with the aim of identifying the epicardium and endocardium contours on them. Protocol 2 (P2) used the network taking CMR-LGE volumes as input, with the aim of identifying the contours of cardiac scars on them. Protocol 3 (P3) used the network taking as input the CMR-LGE volumes multiplied by the corresponding binary mask volumes of the myocardial contours, in order to reduce the region of research to the myocardium, with the aim of identify the contours of cardiac scars. Using Leave-One-Patient-Out cross-validation, in each protocol the network was trained, with a dataset containing 29 volumes, obtained from CMR-LGE sequences of 29 patients, and the corresponding binary masks obtained by manual segmentation of epicardial outlines -endocardium and scars. The volumes have been augmented with a data augmentation algorithm applying geometrical transformations to the slices that constitute the volumes, obtaining 203 volumes. The volumes have been patched in 4 volumes each, thusobtaining 812 volumes, used in the training of U-Net 3D. To evaluate the network performance in the protocols, the results of the three protocols were evaluated by calculating the accuracy, sensitivity, and specificity indexes, and Dice Similarity Coefficient (DSC). The results of the three protocols are summarized below: • P1: accuracy was found to be 89.49% with an interquartile range (IQR) of 2.58%, specificity to 93.98% with an IQR of 2.21% and sensitivity to 78.31% with an IQR of 7.19%. The DSC was 80.41% with an IRQ of 6.82%. • P2: accuracy was 94.15% with an interquartile range (IQR) of 0.92%, specificity at 97.58% with an IQR of 0.42% and sensitivity at 28.22% with an IQR of 9.72%. The DSC was 41.79% with an IRQ of 10.45%. • P3: accuracy was 94.23% with an interquartile range (IQR) of 1.54specificity at 97.27% with an IQR of 1.95% and sensitivity at 53.75% with an IQR of 14.52%. The DSC was 54.27% with an IRQ of 20.73%. The P1 shows promising results when compared with other Deep Learning algorithms present in the literature addressing this task (Wang et al., DSC 0.8301 ± 0.08, Zheng et al., DSC 0.7520 ± 0.06). Comparing the results of P2 and P3 with a Deep Learning algorithm present in the literature developed by Zabihollaby et al., it can be stated that the proposed protocols results are very low compared with the results of the method presented in literature. From the comparison between P2 and P3, the latter achieved higher values of accuracy, sensitivity, specificity and DSC, demonstrating that by limiting the informative content of the volume to the region of interest for the detection of scars (in this case, the myocardial contour) the network performance increased. A comparison with a 2D network, the E-Net, using the same data has been made. The results for the three protocols with E-Net are summarized below: • P1: Accuracy was found to be 93.55% with an interquartile range (IQR) of 1.38%, specificity to 94.26% with an IQR of 1.67% and sensitivity to 91.77% with an IQR of 4.35%. The SDC stood at 85.52% with an IRQ of 7.59%. • P2: Accuracy was 95.79% with an interquartile range (IQR) of 3.55%, specificity at 97.31% with an IQR of 3.01% and sensitivity at 68.77% with an IQR of 34.83%. The SDC was 55.29% with an IRQ of 41.03%. • P3: Accuracy was 96.83% with an interquartile range (IQR) of 3.26%, specificity at 97.89% with an IQR of 2.93% and sensitivity at 88.07% with an IQR of 17.84%. The SDC stood at 71.25% with an IRQ of 31.82%. In conclusion, the results of this thesis are a good starting point for the 3D detection of fibrotic scar tissue in CMR-LGE volumes, supporting the use of deep learning in the field to reduce the intervention of the operator with consequent reduction of processing times.

Le malattie cardiovascolari (CVD) sono la principale causa di morte in tutto il mondo: secondo i dati dell'OMS, circa 17,7 milioni di persone sono morte nel 2015 per CVD, il che significa il 31% di tutti i decessi a livello mondiale. Una delle caratteristiche istologiche più comuni dell'insufficienza cardiaca è la presenza di cicatrici fibrotiche miocardiche, dovute al processo di riparazione dei tessuti. Al giorno d'oggi, al fine di identificare e valutare le cicatrici cardiache, la pratica clinica prevede l'uso dell'imaging medico e della risonanza magnetica cardiaca (CMR) usando il gadolinio come mezzo di contrasto avanzato (LGE). Il CMR-LGE prevede l'uso di gadolinio (Gd), mezzo di contrasto iniettato per via endovenosa circa 10-20 minuti prima dell'esecuzione del CMR, in modo tale da essere già stato eliminato dal tessuto sano e alloggiato solo nel tessuto fibrotico. Le immagini CMR-LGE sono segmentate per identificare i contorni di epicardio ed endocardio in ciascuna fetta di CMR-LGE e, successivamente, si procede all’identificazione del tessuto cicatriziale fibrotico. Attualmente la segmentazione dell'immagine viene eseguita manualmente o semi-automaticamente da un operatore, utilizzando la deviazione standard n (n-SD) e gli algoritmi FWHM. In questo lavoro, viene proposta l'applicazione della rete U-Net 3D, una rete neurale convoluzionale 3D con apprendimento supervisionato progettata per eseguire la segmentazione delle cicatrici su immagini CMR-LGE. Questa rete acquisisce i volumi 3D come input e li elabora con le corrispondenti operazioni 3D, come le convoluzioni 3D, il 3D max pooling e i livelli di up-convolutional 3D. Inoltre, evitiamo i colli di bottiglia, anche chiamati “bottleneck”, nell'architettura della rete e utilizziamo la batch normalization per una convergenza più rapida. Per quanto riguarda la sua architettura, nella parte della rete dedicata all’analisi, ogni strato contiene due convoluzioni 3 × 3 × 3 ciascuna seguite da una ReLU e poi un pooling massimo 2 × 2 × 2 con incrementi di due in ciascuna dimensione. Nella parte della architettura dedicata alla sintesi, ogni strato consiste in una upconvoluzione di 2 × 2 × 2 con falcate di due in ciascuna dimensione, seguita da due convoluzioni 3 × 3 × 3 ciascuna, per poi essere seguita da una ReLU. Le connessioni rapide da livelli di uguale risoluzione nel percorso di analisi forniscono le caratteristiche essenziali ad alta risoluzione al percorso di sintesi. Nell'ultimo strato si trova una convoluzione 1 × 1 × 1, che riduce il numero di canali di uscita al numero di label fissati. Una parte essenziale dell'architettura è la funzione di perdita di pesi “Softmax”. Impostando i pesi dei pixel non etichettati a zero è possibile imparare solo da quelli etichettati e, quindi, si permette di generalizzare a tutto il volume. In questo lavoro, la rete U-Net 3D è stata valutata con tre protocolli sperimentali, un protocollo per rilevare i contorni di epicardio ed endocardio e gli altri due per rilevare i contorni delle cicatrici cardiache. Dopo la valutazione, i risultati ottenuti sono stati confrontati con i risultati di un'altra rete neurale, la E-Net, sviluppata da Riccardo Banali nel lavoro "A Deep Learning Approach for Scar Segmentation from Late Gadolinium Enhancement Cardiac Magnetic Resonance Images". Il protocollo 1 (P1) usa la rete U-Net 3D prendendo i volumi CMR-LGE come input, con l'obiettivo di identificare i contorni dell'epicardio e dell'endocardio su di essi. Il protocollo 2 (P2) usa la rete prendendo i volumi CMR-LGE come input, con l'obiettivo di identificare i contorni delle cicatrici cardiache su di essi. Il protocollo 3 (P3) utilizza la rete prendendo come input i volumi CMR-LGE moltiplicati per i corrispondenti volumi di maschera binaria dei contorni miocardici, al fine di ridurre la regione di ricerca al solo miocardio, con l'obiettivo di identificare i contorni delle cicatrici cardiache con più facilità. Utilizzando la cross convalidazione del tipo “Leave-One-Patient-Out”, per ogni protocollo si è addestrata la rete, con un set di dati contenente 29 volumi, ottenuti da sequenze CMR-LGE di 29 pazienti e le corrispondenti maschere binarie ottenute mediante segmentazione manuale di profili epicardio - endocardio e cicatrici. I volumi sono stati aumentati con un algoritmo di data augmentation applicando trasformazioni geometriche alle fette che compongono i volumi, ottenendo 203 volumi. I volumi sono stati divisi in 4 volumi ciascuno tramite un procedimento di “patch”, ottenendo 812 volumi, utilizzati nell’allenamento della rete U-Net 3D. Per valutare le prestazioni della rete nei protocolli, i risultati dei tre protocolli sono stati valutati calcolando gli indici di accuratezza, sensibilità e specificità e Coefficiente di Somiglianza di Dice (DSC) per l’analisi qualitativo. I risultati dei tre protocolli sono riassunti di seguito: • P1: L’accuratezza è risultata pari a 89,49% con un intervallo interquartile (IQR) del 2,58%, una specificità del 93,98% con un IQR del 2,21% e una sensibilità al 78,31% con un IQR del 7,19%. Il DSC era del 80,41% con un IQR del 6,82%. • P2: La accuratezza è stata del 94,15% con intervallo interquartile (IQR) dello 0,92%, specificità al 97,58% con un IQR dello 0,42% e sensibilità al 28,22% con un IQR del 9,72%. Il DSC è stato di 41,79% con un IQR del 10,45%. • P3: La accuratezza è stata del 94,23% con un intervallo interquartile (IQR) di 1.54 specificità al 97,27% con un IQR dell'1,95% e sensibilità al 53,75% con un IQR del 14,52%. Il DSC era del 54,27% con un IQR del 20,73%. Il P1 mostra risultati promettenti se confrontato con altri algoritmi di Deep Learning presenti in letteratura (Wang et al., DSC 0.8301 ± 0.08, Zheng et al., DSC 0.7520 ± 0.06). Confrontando i risultati dei protocolli P2 e P3 con un algoritmo di Deep learning presente nella letteratura sviluppata da Zabihollaby et al., si può affermare che i risultati dei protocolli proposti sono molto bassi rispetto ai risultati del metodo presentato in letteratura. Dal confronto tra P2 e P3, quest'ultimo ha raggiunto valori più alti di accuratezza, sensibilità, specificità e DSC, dimostrando che limitando il contenuto informativo del volume alla regione di interesse per il rilevamento di cicatrici (in questo caso, il contorno miocardico) le prestazioni della rete aumentano. Un confronto è stato fatto con una rete 2D, la E-Net, utilizzando gli stessi dati. Di seguito vengo riassunti i risultati dei tre protocolli usando E-Net: • P1: L'accuratezza è risultata al 93,55% con un intervallo interquartile (IQR) del 1,38%, la specificità al 94,26% con un IQR di 1,67% e la sensibilità al 91,77% con un IQR del 4,35%. Il DSC è risultato al 85,52% con un IRQ di 7,59%. • P2: L'accuratezza è risultata al 95,79% con un intervallo interquartile (IQR) del 3,55%, la specificità al 97,31% con un IQR di 3,01% e la sensibilità al 68,77% con un IQR del 34,83%. Il DSC è risultato al 55,29% con un IRQ di 41,03%. • P3: L'accuratezza è risultata al 96,83% con un intervallo interquartile (IQR) del 3,26%, la specificità al 97,89% con un IQR di 2,93% e la sensibilità al 88,07% con un IQR del 17,84%. Il DSC è risultato al 71,25% con un IRQ di 31,82%. In conclusione, i risultati di questo lavoro sono un buon punto di partenza per la rilevazione 3D del tessuto cicatriziale fibrotico nei volumi CMR-LGE, supportando l'uso del Deep Learning sul campo per ridurre l'intervento dell'operatore con conseguente riduzione dei tempi di elaborazione.