Type 2 Diabetes Mellitus (T2DM) is a chronic health condition that affects millions of people globally. The risk factors that may lead to the development of T2DM are known and used for the diagnosis and prevention of the pathology, but how they evolve is not clear yet. That's why the comparison of risk factor trajectories between those patients that develop T2DM and those who do not may indicate trends that can suggest the evolution of the pathology, and therefore an early diagnosis. Most of the longitudinal studies in literature focus exclusively on the risk prediction of the pathology, without paying attention to how the the biomarkers evolve together and how they are interdependent between each other. In this thesis, the characterization of different sub-groups of patients at risk of T2DM is conducted. A population of 667 T2DM patients and 25 094 non-diabetic ones was extracted from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN), a Canadian database that gathers data coming from Electronic Medical Records (EMR). The biomarkers extracted included systolic and diastolic blood pressure (sBP and dBP), Body Mass Index (BMI), Low Density Lipoprotein (LDL), High Density Lipoprotein (HDL), Tryglicerides (TG), Fasting Blood Sugar (FBS), Total cholesterol (TC), but also comorbidities, medications and risk factors. A Multivariate Autoregressive Gaussian Process Model (MGPAR) with an autoregressive structure for 19 inputs and 7 outputs was used to model different T2DM and non-diabetic groups. After cross-validation and training (80/20 split) of the models, the RMSE and MAE were computed for each output of each model, with sBP generally having the highest test error among the two blood pressures for both models (T2DM, RMSE: 12.03 mmHg and MAE: 8.89 mmHg; non-diabetic, RMSE: 10.94 mmHg and MAE: 8.63 mmHg) and HDL having the lowest one among the lipids (T2DM, RMSE: 0.16 mmol/L and MAE: 0.12 mmol/L; non-diabetic, RMSE: 0.19 mmol/L and MAE: 0.14 mmol/L). A single input single output study to see how each input affects each output in a future time window was conducted, with results mostly coherent with literature. The models were applied to data of simulated and real patients to compare different T2DM groups and look at the performances of the model. Despite the general trend of the models follows the real data, improvements are needed to model fast changes between adjacent years, thus increasing flexibility. Also, a real-world validation should be conducted in future, evaluating the performance of the models using different metrics like for example AUC and/or Brier Score.
Il diabete mellito di tipo 2 è una condizione di salute cronica che colpisce milioni di persone in tutto il mondo. I fattori di rischio che influiscono sullo sviluppo del diabete sono conosciuti e utilizzati per la diagnosi e la prevenzione della patologia, ma la loro evoluzione non è ancora chiara. E' per questo che lo studio delle traiettorie nel tempo e il confronto tra diversi gruppi di pazienti contrapposti, come pazienti diabetici e non diabetici, potrebbe aiutare a scoprire come la patologia si sviluppa, e quindi contribuire ad ottenere una diagnosi il prima possibile. I principali studi in letteratura si concentrano più sulla predizione della patologia, senza prestare attenzione alla contemporanea evoluzione di diverse misure fisiologiche nel tempo e lo studio del loro rapporto. In questo lavoro, la caratterizzazione di diversi gruppi di pazienti a rischio di T2DM è condotta e studiata. Un gruppo di 667 pazienti affetti da T2DM e da 25 094 pazienti non diabetici sono stati estratti dal CPCSSN, un database canadese che raccoglie dati provienienti da cartelle cliniche elettroniche. Le misure estratte comprendono pressione sistolica e diastolica (sBP e dBP), indice di massa corporea (BMI), lipoproteine a bassa ed alta densità (LDL e HDL), trigliceridi (TG), glucosio a digiuno (FBS) e colesterolo totale (TC). Un modello multivariato a processi gaussiani (MGPAR) applicato ad una struttura autoregressiva con 19 input e 7 output è stato utiilizzato per creare più modelli a partire da diversi gruppi di pazienti diabetici e non diabetici. Dopo aver effettuato una cross-validazione e aver addestrato i modelli (80% training, 20% testing) sono stati calcolati gli indici di errore RMSE e MAE per ogni output, con errori maggiori per l'sBP (T2DM, RMSE: 12.03 mmHg e MAE: 8.89 mmHg; non diabetici, RMSE: 10.94 mmHg and MAE: 8.63 mmHg) ed errori minori per l'HDL (T2DM, RMSE: 0.16 mmol/L e MAE: 0.12 mmol/L; non diabetici, RMSE: 0.19 mmol/L and MAE: 0.14 mmol/L). Le predizioni tra singoli input e output sono state studiate per vedere come ogni misura fisiologica influisce sull'aumentare o il diminuire delle altre nell'istante di tempo successivo, con risultati perlopiù coerenti con la letteratura. I modelli sono stati anche applicati a pazienti simulati e reali per valutare come diversi modelli con lo stesso input, oppure diversi modelli ugualmente applicabili allo stesso paziente, si comportano. Nonostante l'andamento dei modelli segua generalmente i dati reali, sono necessari dei miglioramenti per modellizzare i rapidi cambiamenti dei dati reali in anni adiacenti, aumentando, in questo modo, la flessibilità dei modelli. E' anche necessario condurre una vera e propria validazione in futuro, osservando le predizioni dei modelli nell'anno di onset della patologia e utilizzando indici di AUC e/o di brier per valutare le prestazioni dei modelli.
Multi-input multi-output dynamic modelling of Type 2 Diabetes using data from Electronic Medical Records
SIMEONE, DAVIDE
2022/2023
Abstract
Type 2 Diabetes Mellitus (T2DM) is a chronic health condition that affects millions of people globally. The risk factors that may lead to the development of T2DM are known and used for the diagnosis and prevention of the pathology, but how they evolve is not clear yet. That's why the comparison of risk factor trajectories between those patients that develop T2DM and those who do not may indicate trends that can suggest the evolution of the pathology, and therefore an early diagnosis. Most of the longitudinal studies in literature focus exclusively on the risk prediction of the pathology, without paying attention to how the the biomarkers evolve together and how they are interdependent between each other. In this thesis, the characterization of different sub-groups of patients at risk of T2DM is conducted. A population of 667 T2DM patients and 25 094 non-diabetic ones was extracted from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN), a Canadian database that gathers data coming from Electronic Medical Records (EMR). The biomarkers extracted included systolic and diastolic blood pressure (sBP and dBP), Body Mass Index (BMI), Low Density Lipoprotein (LDL), High Density Lipoprotein (HDL), Tryglicerides (TG), Fasting Blood Sugar (FBS), Total cholesterol (TC), but also comorbidities, medications and risk factors. A Multivariate Autoregressive Gaussian Process Model (MGPAR) with an autoregressive structure for 19 inputs and 7 outputs was used to model different T2DM and non-diabetic groups. After cross-validation and training (80/20 split) of the models, the RMSE and MAE were computed for each output of each model, with sBP generally having the highest test error among the two blood pressures for both models (T2DM, RMSE: 12.03 mmHg and MAE: 8.89 mmHg; non-diabetic, RMSE: 10.94 mmHg and MAE: 8.63 mmHg) and HDL having the lowest one among the lipids (T2DM, RMSE: 0.16 mmol/L and MAE: 0.12 mmol/L; non-diabetic, RMSE: 0.19 mmol/L and MAE: 0.14 mmol/L). A single input single output study to see how each input affects each output in a future time window was conducted, with results mostly coherent with literature. The models were applied to data of simulated and real patients to compare different T2DM groups and look at the performances of the model. Despite the general trend of the models follows the real data, improvements are needed to model fast changes between adjacent years, thus increasing flexibility. Also, a real-world validation should be conducted in future, evaluating the performance of the models using different metrics like for example AUC and/or Brier Score.File | Dimensione | Formato | |
---|---|---|---|
Executive_Summary_Simeone_968253.pdf
Open Access dal 18/04/2024
Dimensione
622.16 kB
Formato
Adobe PDF
|
622.16 kB | Adobe PDF | Visualizza/Apri |
Master_Thesis_Simeone_968253.pdf
Open Access dal 18/04/2024
Dimensione
5.75 MB
Formato
Adobe PDF
|
5.75 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/212192