Type 2 diabetes (T2D) is a complex, multifactorial disease that affects hundreds of millions of people worldwide. Despite extensive research, identifying the biological factors that causally drive its onset and progression remains a key challenge for improving prevention, diagnosis, and treatment. This thesis addresses two interconnected research questions: a methodological one, evaluating the performance of Stability Selection (SS) in identifying potential causal predictors among blood proteins, and a clinical one, aimed at uncovering specific proteins that may play a causal role in type 2 diabetes development. To this aim, we used data coming from the BELIEVE study, which included genetic data, health data and 7244 blood protein measurements for around 10000 individuals. Methodologically, SS demonstrated strong robustness and consistency as a feature selection approach: across diverse input configurations, it consistently identified a stable set of 18 unique proteins. Compared with classical LASSO regression, SS provided greater model stability while maintaining competitive predictive power, confirming its suitability for high-dimensional omics data. Moreover, a comparison with the literature suggested that 8 out of the 18 predictors might have a causal role in the development of T2D. To assess causality, SS-identified proteins were further examined using Mendelian Randomization (MR) in both one sample and two sample designs. However, due to the small sample size, MR did not provide usable results. The effectiveness of SS for causal inference therefore needs to be confirmed through MR in larger studies, although the existing literature provides a promising starting point. Overall, this thesis highlights the complementary strengths of SS and MR and supports their combined use.
Il diabete di tipo 2 (T2D) è una malattia complessa e multifattoriale che colpisce centinaia di milioni di persone in tutto il mondo. Nonostante le numerose ricerche, l’identificazione dei fattori biologici che determinano in modo causale l’insorgenza e la progressione della malattia rimane una sfida fondamentale per migliorare la prevenzione, la diagnosi e il trattamento. Questa tesi affronta due domande di ricerca tra loro interconnesse: una di natura metodologica, volta a valutare le prestazioni della Stability Selection (SS) nell’identificazione di potenziali predittori causali tra le proteine del sangue, e una di natura clinica, mirata a individuare specifiche proteine che possano avere un ruolo causale nello sviluppo del diabete di tipo 2. A tale scopo, sono stati utilizzati i dati dello studio BELIEVE, che includeva informazioni genetiche, dati clinici e 7244 misurazioni di proteine ematiche relative a circa 10000 individui. Dal punto di vista metodologico, la SS si è dimostrata un metodo di selezione delle varaibili robusto e coerent: sottoposta a input diversi, ha stabilmente identificato un insieme di 18 proteine uniche. Rispetto alla regressione LASSO classica, la SS ha dunque fornito un modello più stabile mantenendo un potere predittivo competitivo, confermandone così l’idoneità per l’analisi di dati omici ad alta dimensionalità. Inoltre, il confronto con la letteratura ha suggerito che 8 delle 18 proteine individuate potrebbero avere un ruolo causale nello sviluppo del T2D. Per valutarne la causalità, le proteine identificate da SS sono state successivamente analizzate tramite Mendelian Randomization (MR), sia con un approccio one sample che two sample. Tuttavia, a causa della bassa numerosità campionaria, le analisi MR non hanno prodotto risultati utilizzabili. L’efficacia della SS per l'inferenza causale deve quindi essere confermata attraverso analisi MR condotte su campioni più ampi, sebbene la letteratura esistente fornisca un punto di partenza promettente. Nel complesso, questa tesi mette in evidenza le forze complementari di SS e MR e ne sostiene l’uso combinato.
Causal proteomic predictors of diabetes in South-Asia: a methodological and applied study using Stability Selection and Mendelian Randomization
TRIFILIO, PAOLO
2025/2026
Abstract
Type 2 diabetes (T2D) is a complex, multifactorial disease that affects hundreds of millions of people worldwide. Despite extensive research, identifying the biological factors that causally drive its onset and progression remains a key challenge for improving prevention, diagnosis, and treatment. This thesis addresses two interconnected research questions: a methodological one, evaluating the performance of Stability Selection (SS) in identifying potential causal predictors among blood proteins, and a clinical one, aimed at uncovering specific proteins that may play a causal role in type 2 diabetes development. To this aim, we used data coming from the BELIEVE study, which included genetic data, health data and 7244 blood protein measurements for around 10000 individuals. Methodologically, SS demonstrated strong robustness and consistency as a feature selection approach: across diverse input configurations, it consistently identified a stable set of 18 unique proteins. Compared with classical LASSO regression, SS provided greater model stability while maintaining competitive predictive power, confirming its suitability for high-dimensional omics data. Moreover, a comparison with the literature suggested that 8 out of the 18 predictors might have a causal role in the development of T2D. To assess causality, SS-identified proteins were further examined using Mendelian Randomization (MR) in both one sample and two sample designs. However, due to the small sample size, MR did not provide usable results. The effectiveness of SS for causal inference therefore needs to be confirmed through MR in larger studies, although the existing literature provides a promising starting point. Overall, this thesis highlights the complementary strengths of SS and MR and supports their combined use.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_12_Trifilio_Paolo_Executive_Summary.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
406.81 kB
Formato
Adobe PDF
|
406.81 kB | Adobe PDF | Visualizza/Apri |
|
2025_12_Trifilio_Paolo_Thesis.pdf
accessibile in internet per tutti
Descrizione: Thesis
Dimensione
4.02 MB
Formato
Adobe PDF
|
4.02 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/245917