Parkinson's disease (PD) is a prevalent neurodegenerative disorder presenting significant challenges for both patients and healthcare providers. Despite its widespread impact, the precise etiology of PD remains elusive. One of the research question for the healthcare community consist in the identification of molecular subtypes based on gene expression, which could significantly enhance our understanding and treatment of the disease. In this context, machine learning has risen as a powerful tool and has proven its validity by offering new avenues to unravel complex disease mechanisms. The identification of subtypes and clusters has been one of its many area of proficiency. This thesis focuses on developing a pipeline to extract potential subtypes from gene expression data, using unsupervised machine learning techniques centered around autoencoder architectures. To validate this approach, we first applied our methods to breast cancer data, where subtypes are already well-established and understood. The autoencoders employed had they hyper-parameters tuned using genetic algorithms, an approach chosen for its effectiveness in optimizing complex models. This methodology demonstrated its efficacy in identifying known subtypes in breast cancer, validating the potential of our pipeline on the one hand. On the other hand, when applying this refined pipeline to PD data, the results were more modest, highlighting the complexities and unique challenges presented by neurodegenerative diseases like Parkinson's
La malattia di Parkinson (PD) è un disturbo neurodegenerativo diffuso che presenta notevoli sfide sia per i pazienti che per i fornitori di assistenza sanitaria. Nonostante il suo impatto diffuso, l'eziologia precisa della PD rimane elusiva. Una delle sue questioni più complesse è stata l'identificazione di sottotipi basati sull'espressione genica, che potrebbe migliorare significativamente la nostra comprensione e gestione della malattia. In questo contesto, l'apprendimento automatico si è affermato come uno strumento potente e ha dimostrato la sua validità offrendo nuove vie per svelare i meccanismi complessi delle malattie. L'identificazione di sottotipi e cluster è stata una delle sue molte aree di competenza. Questa tesi si concentra sullo sviluppo di un pipeline per estrarre potenziali sottotipi nella PD, utilizzando tecniche di machine learning incentrate su architetture autoencoder. Per validare questo approccio, abbiamo inizialmente applicato i nostri metodi ai dati sul cancro al seno, dove i sottotipi sono già ben documentati. Gli autoencoder impiegati hanno avuto i loro iper-parametri ottimizzati utilizzando algoritmi genetici, un approccio scelto per la sua efficacia nell'ottimizzazione di modelli complessi. Questa metodologia ha dimostrato la sua efficacia nell'identificazione di sottotipi noti nel cancro al seno, validando il potenziale del nostro pipeline da un lato. Dall'altro lato, applicando questo pipeline raffinato ai dati della PD, i risultati sono stati più modesti, evidenziando le complessità e le sfide uniche presentate da malattie neurodegenerative come il Parkinson.
Genetic algorithm-driven auto-encoders: unraveling complex patterns in Parkinson's and breast cancer data
JARA--MIKOLAJCZAK, AYGALIC JIMMY
2022/2023
Abstract
Parkinson's disease (PD) is a prevalent neurodegenerative disorder presenting significant challenges for both patients and healthcare providers. Despite its widespread impact, the precise etiology of PD remains elusive. One of the research question for the healthcare community consist in the identification of molecular subtypes based on gene expression, which could significantly enhance our understanding and treatment of the disease. In this context, machine learning has risen as a powerful tool and has proven its validity by offering new avenues to unravel complex disease mechanisms. The identification of subtypes and clusters has been one of its many area of proficiency. This thesis focuses on developing a pipeline to extract potential subtypes from gene expression data, using unsupervised machine learning techniques centered around autoencoder architectures. To validate this approach, we first applied our methods to breast cancer data, where subtypes are already well-established and understood. The autoencoders employed had they hyper-parameters tuned using genetic algorithms, an approach chosen for its effectiveness in optimizing complex models. This methodology demonstrated its efficacy in identifying known subtypes in breast cancer, validating the potential of our pipeline on the one hand. On the other hand, when applying this refined pipeline to PD data, the results were more modest, highlighting the complexities and unique challenges presented by neurodegenerative diseases like Parkinson'sFile | Dimensione | Formato | |
---|---|---|---|
Master_s_Thesis_Aygalic__new_-30.pdf
accessibile in internet per tutti
Descrizione: Main Text
Dimensione
5.17 MB
Formato
Adobe PDF
|
5.17 MB | Adobe PDF | Visualizza/Apri |
Executive_Summary___Aygalic-15.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
821.41 kB
Formato
Adobe PDF
|
821.41 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/214686