Breast cancer is a complex disease whose molecular traits are strictly implicated in progression, response to treatment and clinical outcome. Therefore, investigating the gene profile offers significant advantages. Suffice it to say that it is estimated that in 25% of cases an in-depth risk assessment could avoid administering post-surgical adjuvant chemotherapy to patients. To date, molecular stratification techniques are performed on different platforms and gene signatures and show poor concordance. However, categorization into intrinsic subtypes (Luminal A, Luminal B, Her2-enriched and Basal) is a well-known standard, summarizing molecular features and providing clinical outcome indicators, which can also be exploited in risk of recurrence prognostic models. RNA-Seq Profiling is a method of genomic investigation that could both improve and spread further in clinical practice the analysis of the individual patient's molecular traits, by replicating different known stratification strategies and integrating innovative classification approaches on a single RNA-seq profile. All this in order to consolidate the robustness of the subtyping and the accuracy of the clinical outcome prediction, with significantly lower costs. In this context, the thesis work is part of a wider project, in collaboration between the Istituto di Ricerca di Candiolo-IRCCS and the Politecnico di Milano, aimed at encouraging the introduction of RNA-seq technology in the clinical practice of the breast cancer. Two were the computational paths followed. On the one hand, the emulation of a classification method already validated on other platforms and focused on the PAM50 panel, a well-known signature from the state of the art, showed the value of its replicability on RNA-Seq. On the other hand, a typical machine learning survey led us to compare different supervised techniques to perform the subtyping task starting from the complete RNA-Seq profiles, up to delineate as the most promising approach a Multiclass Logistic Regression combined with a suitably designed strategy of feature selection.
Il tumore al seno è una malattia complessa, i cui tratti molecolari risultano strettamente implicati nella progressione, nella risposta ai trattamenti e nell'esito clinico. Indagare il profilo genico offre, pertanto, notevoli vantaggi. Basti pensare che si stima che nel 25% dei casi un'approfondita valutazione di rischio potrebbe evitare di somministrare ai pazienti chemioterapia adiuvante post-chirurgica. Ad oggi, le tecniche di stratificazione molecolare sono svolte su diverse piattaforme e signature geniche e risultano poco concordanti. Tuttavia, la categorizzazione in sottotipi intrinseci (Luminale A, Luminale B, Her2-arricchito e Basale) è uno standard ben noto che riassume i tratti molecolari e fornisce indicatori di esito clinico, sfruttabili anche in modelli prognostici di rischio di recidiva. L'RNA-Seq Profiling è un metodo di indagine genomica che potrebbe migliorare e consolidare nella pratica clinica l'analisi dei tratti molecolari del singolo paziente, replicando diverse strategie di stratificazione note ed integrando innovativi approcci di classificazione su un unico profilo RNA-seq. Tutto ciò al fine di consolidare la robustezza della sotto-tipizzazione e l'accuratezza della predizione dell'esito clinico, con costi decisamente più contenuti. In questo ambito, il lavoro di tesi è parte di un progetto più ampio, in collaborazione tra l’Istituto di Ricerca di Candiolo-IRCCS e il Politecnico di Milano, volto a favorire l’introduzione della tecnologia RNA-seq nella pratica clinica del tumore al seno. Due sono stati i percorsi computazionali seguiti. Da un lato l’emulazione di un metodo di classificazione già validato e focalizzato sul pannello PAM50, una signature ben nota dallo stato dell’arte, ha mostrato il valore della sua replicabilità su RNA-Seq. Dall’altro lato, un’indagine tipica del machine learning ci ha portato a confrontare diverse tecniche supervisionate per eseguire la sotto-tipizzazione a partire dai profili RNA-Seq completi, sino a delineare come approccio più promettente una Regressione Logistica multiclasse combinata con una strategia di feature selection appositamente studiata.
RNA sequencing-based computational subtyping of breast cancer for clinical outcome prediction
CASCIANELLI, SILVIA
2017/2018
Abstract
Breast cancer is a complex disease whose molecular traits are strictly implicated in progression, response to treatment and clinical outcome. Therefore, investigating the gene profile offers significant advantages. Suffice it to say that it is estimated that in 25% of cases an in-depth risk assessment could avoid administering post-surgical adjuvant chemotherapy to patients. To date, molecular stratification techniques are performed on different platforms and gene signatures and show poor concordance. However, categorization into intrinsic subtypes (Luminal A, Luminal B, Her2-enriched and Basal) is a well-known standard, summarizing molecular features and providing clinical outcome indicators, which can also be exploited in risk of recurrence prognostic models. RNA-Seq Profiling is a method of genomic investigation that could both improve and spread further in clinical practice the analysis of the individual patient's molecular traits, by replicating different known stratification strategies and integrating innovative classification approaches on a single RNA-seq profile. All this in order to consolidate the robustness of the subtyping and the accuracy of the clinical outcome prediction, with significantly lower costs. In this context, the thesis work is part of a wider project, in collaboration between the Istituto di Ricerca di Candiolo-IRCCS and the Politecnico di Milano, aimed at encouraging the introduction of RNA-seq technology in the clinical practice of the breast cancer. Two were the computational paths followed. On the one hand, the emulation of a classification method already validated on other platforms and focused on the PAM50 panel, a well-known signature from the state of the art, showed the value of its replicability on RNA-Seq. On the other hand, a typical machine learning survey led us to compare different supervised techniques to perform the subtyping task starting from the complete RNA-Seq profiles, up to delineate as the most promising approach a Multiclass Logistic Regression combined with a suitably designed strategy of feature selection.File | Dimensione | Formato | |
---|---|---|---|
2018_07_Cascianelli.pdf
non accessibile
Descrizione: Testo della tesi
Dimensione
2.19 MB
Formato
Adobe PDF
|
2.19 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/141803