Pancreatic cancer is anticipated to become the second leading cause of cancer-related deaths in the next decade. Early diagnosis is crucial to improve survival rates significantly. However, conventional 3D medical imaging methods fall short in achieving early detection and presently, endoscopic ultrasound (EUS) stands as the sole viable option. Nevertheless, mastering this imaging technique proves challenging for gastroenterologists, resulting in suboptimal pancreas screening procedures that compromise diagnostic accuracy. To address these challenges, the presented project aims to develop a computer vision (CV) deep learning model using a Multi-Task Learning (MTL) approach to concurrently perform two tasks in real-time on EUS images: classifying the observation viewpoint (referred to as "phase") and conducting object detection to locate and recognize anatomical structures such as pancreas parenchyma tissue, non-neoplastic soft tissue tumors, and neoplastic lesions. The intended benefits to surgeons include improved navigation and diagnostic capabilities. The project pipeline begins by identifying the lack of tailored models and fair comparisons between MTL architectures in medically-related computer vision applications available in literature. After establishing varied Single-Task Learning (STL) baselines, four architectural designs are developed to illustrate the effects of different degrees of network-sharing between the Phase Classification and Lesion and Pancreas Detection tasks, utilizing various processing technologies including CNNs and Transformers. Additionally, two state-of-the-art algorithms, namely Nash-MTL and Uncertainty Weighted Loss, designed to tackle conflicting gradients between tasks during parameter updates, have been explored so as to mitigate the negative transfer commonly observed in MTL solutions. A new custom inter-task loss is also developed to incorporate medical knowledge from anatomy during training, guiding common learning across both tasks. Ultimately, significant improvements are observed with the MTL architecture, including an increase in mean Average Precision (mAP) for object detection (+1.7) and Average Recall for phase classification (+7.9\%) compared to STL baselines.
Il carcinoma pancreatico emerge come futura causa principale di decessi correlati al cancro nel prossimo decennio. Una diagnosi precoce riveste un'importanza cruciale per elevare i tassi di sopravvivenza delle persone che ve ne sono affette; tuttavia, i tradizionali metodi di imaging medico presentano marcate limitazioni in questo senso, rendendo l'ecografia endoscopica (EUS) l'unica opzione percorribile per ottenere una diagnosi accurata nelle prime fasi dello sviluppo delle neoplasie pancreatiche. Questa tecnica, sebbene preziosa, richiede competenze specifiche da parte dei gastroenterologi, aspetto che potrebbe compromettere la qualità degli screening e di conseguenza la precisione diagnostica. Al fine di affrontare queste sfide, il presente progetto si propone di sviluppare un modello di deep learning per la computer vision (CV) mediante un approccio Multi-Task Learning (MTL) da applicare in tempo reale sulle immagini EUS. Tale modello mira a eseguire simultaneamente due compiti: classificare il punto di osservazione (denominata "Phase") e individuare le strutture anatomiche cruciali, quali il tessuto parenchimale pancreatico, i tumori dei tessuti molli non neoplastici e le lesioni neoplastiche. I vantaggi previsti includono un miglioramento della precisione diagnostica e delle capacità di navigazione per i chirurghi. Il progetto inizia con l'identificazione della mancanza di modelli specifici e confronti equi tra architetture MTL nell'ambito delle applicazioni di CV in campo medico. Dopo un'analisi delle diverse architetture nel contesto del Single-Task Learning (STL), si procede con la progettazione e l'implementazione di quattro modelli MTL, con l'obiettivo di esplorare gli effetti di vari gradi di condivisione della rete tra i due compiti, impiegando tecnologie quali CNN e Transformers. Vengono inoltre considerati due algoritmi presenti in letteratura, ossia Nash-MTL e Uncertainty Weighted Loss, per risolvere i conflitti tra gradienti durante l'aggiornamento dei parametri del modello nella fase di addestramento, al fine di mitigare gli effetti del "negative transfer" comunemente osservato nelle soluzioni MTL. Infine, viene introdotto un nuovo metodo di penalizzazione ad hoc, volto a integrare la conoscenza medica dell'anatomia nel processo di addestramento della rete, così da guidare l'apprendimento in modo condiviso tra i due compiti. In conclusione, si evidenziano miglioramenti significativi nell'utilizzo dell'architettura MTL, con un aumento della Mean Average Precision (mAP) per la detection di parenchyma e lesioni (+1,7) e della Recall Media per la classificazione della fase (+7,9\%) rispetto alle basi di confronto nel paradigma STL.
Multi-Task Learning for Pancreatic Endoscopic Ultrasound Images: View Classification and Lesion Detection
Feragotto, Erik
2023/2024
Abstract
Pancreatic cancer is anticipated to become the second leading cause of cancer-related deaths in the next decade. Early diagnosis is crucial to improve survival rates significantly. However, conventional 3D medical imaging methods fall short in achieving early detection and presently, endoscopic ultrasound (EUS) stands as the sole viable option. Nevertheless, mastering this imaging technique proves challenging for gastroenterologists, resulting in suboptimal pancreas screening procedures that compromise diagnostic accuracy. To address these challenges, the presented project aims to develop a computer vision (CV) deep learning model using a Multi-Task Learning (MTL) approach to concurrently perform two tasks in real-time on EUS images: classifying the observation viewpoint (referred to as "phase") and conducting object detection to locate and recognize anatomical structures such as pancreas parenchyma tissue, non-neoplastic soft tissue tumors, and neoplastic lesions. The intended benefits to surgeons include improved navigation and diagnostic capabilities. The project pipeline begins by identifying the lack of tailored models and fair comparisons between MTL architectures in medically-related computer vision applications available in literature. After establishing varied Single-Task Learning (STL) baselines, four architectural designs are developed to illustrate the effects of different degrees of network-sharing between the Phase Classification and Lesion and Pancreas Detection tasks, utilizing various processing technologies including CNNs and Transformers. Additionally, two state-of-the-art algorithms, namely Nash-MTL and Uncertainty Weighted Loss, designed to tackle conflicting gradients between tasks during parameter updates, have been explored so as to mitigate the negative transfer commonly observed in MTL solutions. A new custom inter-task loss is also developed to incorporate medical knowledge from anatomy during training, guiding common learning across both tasks. Ultimately, significant improvements are observed with the MTL architecture, including an increase in mean Average Precision (mAP) for object detection (+1.7) and Average Recall for phase classification (+7.9\%) compared to STL baselines.File | Dimensione | Formato | |
---|---|---|---|
2024_04_Feragotto_TESI_01.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: TESI
Dimensione
20.84 MB
Formato
Adobe PDF
|
20.84 MB | Adobe PDF | Visualizza/Apri |
2024_04_Feragotto_Executive_Summary_02.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Executive Summary
Dimensione
5.31 MB
Formato
Adobe PDF
|
5.31 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/219220