Type 2 Diabetes Mellitus (T2DM) poses a significant and growing health challenge globally. Clinicians recognize prediabetes as an intermediate reversible state that precedes the onset of T2DM, offering a critical window for prevention. This study aims to leverage a dataset extracted ad hoc from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) to develop a Machine Learning model capable of predicting not only the onset of T2DM but also the occurrence of prediabetes and normoglycemia. The model is trained on the total population (41888 records) and two stratifications: CurrentState=PD subgroup (19631 records), comprising patients with prediabetes at the time of data acquisition, and CurrentState=NG subgroup (22257 records), consisting of patients with normoglycemia. Post hoc explainability techniques, including feature importance analysis, partial dependence plots, and waterfall plots for single instances, are employed to dissect the predictive models’ decision-making process. Additionally, counterfactual explanations are generated using the best-performing model to aid clinicians in devising personalized strategies to prevent T2DM progression while patients are still in the prediabetes state. Results demonstrate promising predictive performance, with an F1Macro score of 83% for the total population, 81% for CurrentState=PD subgroup, and 58% for CurrentState=NG subgroup. Explainability analysis underscores the significance of Fasting Blood Sugar (FBS) and glycated hemoglobin (HbA1c), in the classification task, followed by Body Mass Index (BMI), High-Density Lipoprotein (HDL), and Low-Density Lipoprotein (LDL), while blood pressure has less impact on the models. Furthermore, counterfactual explanations, shed light on actionable interventions, with features such as BMI, HbA1c, FBS, and HDL emerging again as key factors for personalized prevention strategies across both the total population and CurrentState=PD subgroup. Overall, this research underscores the potential of predictive modeling and explainable Artificial Intelligence in informing preventive interventions and personalized patient management strategies in the context of T2DM progression.
Il diabete mellito di tipo 2 (T2DM) rappresenta una crescente e significativa sfida sanitaria globale. I medici riconoscono il prediabete (PD) come uno stato reversibile che precede l’insorgenza del T2DM, offrendo una finestra critica per la prevenzione. Questo studio utilizza un dataset estratto ad hoc dal Canadian Primary Care Sentinel Surveillance Network (CPCSSN) per sviluppare modelli di Machine Learning in grado di prevedere non solo l’insorgenza del T2DM, ma anche il verificarsi di PD e normoglicemia. I modelli sono stati addestrati sulla popolazione totale (41888 record) e due suoi sottogruppi: il sottogruppo CurrentState=PD (19631 record), costituito da pazienti con PD al momento dell’acquisizione dei dati, e il sottogruppo CurrentState=NG (22257 record), costituito da pazienti con normoglicemia. Per analizzare il processo decisionale dei modelli predittivi sono state utilizzate tecniche di post-hoc explainability, tra cui analisi dell’importanza delle caratteristiche, grafici di dipendenza parziale e grafici a cascata. Inoltre, sono state generate spiegazioni controfattuali utilizzando il modello con le migliori prestazioni per aiutare i medici a ideare strategie personalizzate per prevenire la progressione del T2DM quando i pazienti sono ancora nello stato di PD. La performance predittiva è promettente, con un punteggio F1Macro dell’83% per la popolazione totale, dell’81% per il sottogruppo CurrentState=PD e del 58% per il sottogruppo CurrentState=NG. L’analisi di explainability sottolinea come glicemia a digiuno (FBS) e emoglobina glicata (HbA1c) abbiano un contributo rilevante nel compito di classificazione, seguite da indice di massa corporea (BMI), lipoproteine ad alta e bassa densità (HDL e LDL) e da un contributo marginale della pressione sanguigna. Inoltre, tramite le spiegazioni controfattuali, caratteristiche quali BMI, HbA1c, FBS e HDL emergono nuovamente come fattori chiave per strategie di prevenzione personalizzata. Nel complesso, questa ricerca sottolinea il potenziale della modellazione predittiva e dell’explainable Artificial Intelligence nel formulare strategie di prevenzione personalizzata per pazienti nel contesto del T2DM.
Exploring prediabetes pathways: using machine learning and counterfactual explanations for Type 2 Diabetes prediction and prevention
Console, Davide
2022/2023
Abstract
Type 2 Diabetes Mellitus (T2DM) poses a significant and growing health challenge globally. Clinicians recognize prediabetes as an intermediate reversible state that precedes the onset of T2DM, offering a critical window for prevention. This study aims to leverage a dataset extracted ad hoc from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) to develop a Machine Learning model capable of predicting not only the onset of T2DM but also the occurrence of prediabetes and normoglycemia. The model is trained on the total population (41888 records) and two stratifications: CurrentState=PD subgroup (19631 records), comprising patients with prediabetes at the time of data acquisition, and CurrentState=NG subgroup (22257 records), consisting of patients with normoglycemia. Post hoc explainability techniques, including feature importance analysis, partial dependence plots, and waterfall plots for single instances, are employed to dissect the predictive models’ decision-making process. Additionally, counterfactual explanations are generated using the best-performing model to aid clinicians in devising personalized strategies to prevent T2DM progression while patients are still in the prediabetes state. Results demonstrate promising predictive performance, with an F1Macro score of 83% for the total population, 81% for CurrentState=PD subgroup, and 58% for CurrentState=NG subgroup. Explainability analysis underscores the significance of Fasting Blood Sugar (FBS) and glycated hemoglobin (HbA1c), in the classification task, followed by Body Mass Index (BMI), High-Density Lipoprotein (HDL), and Low-Density Lipoprotein (LDL), while blood pressure has less impact on the models. Furthermore, counterfactual explanations, shed light on actionable interventions, with features such as BMI, HbA1c, FBS, and HDL emerging again as key factors for personalized prevention strategies across both the total population and CurrentState=PD subgroup. Overall, this research underscores the potential of predictive modeling and explainable Artificial Intelligence in informing preventive interventions and personalized patient management strategies in the context of T2DM progression.File | Dimensione | Formato | |
---|---|---|---|
2024_04_Console_Executive_Summary_02.pdf
Open Access dal 18/03/2025
Descrizione: executive summary
Dimensione
603.22 kB
Formato
Adobe PDF
|
603.22 kB | Adobe PDF | Visualizza/Apri |
2024_04_Console_Tesi_01.pdf
Open Access dal 18/03/2025
Descrizione: thesis
Dimensione
4.97 MB
Formato
Adobe PDF
|
4.97 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/218425