Immunotherapy has emerged as a promising treatment for lung cancer, but a patient’s response to this treatment varies significantly to the point its situation could worsen. To determine if a patient can have a good response to immunotherapy, different biomarkers are used whose efficacy is, however, not satisfying. A possible solution to improve predictive performance is to use a machine learning (ML) model. However, in the medical field, there are strict regulations concerning privacy and data usage, thus restricting the experience an ML model can receive. Federated Learning (FL) can solve this issue by allowing ML models to be trained using data from different clients without centralizing data, thus respecting privacy. In this thesis, FL is explored and analyzed, suggesting possible solutions to adapt classic supervised ML techniques, such as normalization and outlier detection, to a distributed context. With the help of existing frameworks, federated training has been set, simulating a real-world scenario where training is performed on the client side, without the need of an ML expert, preserving patients’ privacy. The performances are compared to the ones of the same model trained in a centralized manner to asses whether a federated training can reach the same levels of performance as a centralized one. The federated training is then applied to a real-world dataset, the I3LUNG dataset, where each client represents a center participating in the project. These results can be used for other projects involving medical data coming from different centers with privacy constraints.
L’immunoterapia si è affermata come un trattamento promettente per il cancro al pol- mone, ma la risposta di un paziente a questo trattamento varia significativamente al punto che la sua situazione potrebbe peggiorare. Per determinare se un paziente può avere una buona risposta all’immunoterapia, vengono utilizzati diversi biomarcatori la cui efficacia, tuttavia, non è soddisfacente. Una possibile soluzione per migliorare la performance predittiva è l’utilizzo di un modello di Machine Learning (ML). Tuttavia, nel campo medico, esistono rigorose normative relative alla privacy e all’uso dei dati, che limitano l’esperienza che un modello di ML può ricevere. Il Federated Learning (FL) può risolvere questo problema poichè consente l’addestramento dei modelli di ML utilizzando dati provenienti da diversi clienti senza centralizzarli, rispettando così la privacy. In questa tesi, il FL è esplorato e analizzato suggerendo possibili soluzioni per adattare le classiche tecniche di ML supervisionato, come la normalizzazione e il rilevamento degli outlier, a un contesto distribuito. Con l’aiuto di framework esistenti, l’addestramento federato è stato impostato simulando uno scenario reale in cui l’addestramento viene eseguito nei centri, senza la necessità di un esperto di ML, preservando la privacy dei pazienti. Le prestazioni sono confrontate con quelle dello stesso modello addestrato in modo centralizzato per valutare se un addestramento federato può raggiungere gli stessi livelli di performance di uno centralizzato. L’addestramento federato è poi applicato a un dataset reale, il dataset I3LUNG, dove ogni nodo rappresenta un centro partecipante al progetto. Questi risultati possono essere utilizzati per altri progetti che coinvolgono dati medici provenienti da diversi centri con vincoli di privacy.
Ensuring data privacy in healthcare with federated learning: challenges, solutions and implementation
MOLTENI, ALESSANDRO
2022/2023
Abstract
Immunotherapy has emerged as a promising treatment for lung cancer, but a patient’s response to this treatment varies significantly to the point its situation could worsen. To determine if a patient can have a good response to immunotherapy, different biomarkers are used whose efficacy is, however, not satisfying. A possible solution to improve predictive performance is to use a machine learning (ML) model. However, in the medical field, there are strict regulations concerning privacy and data usage, thus restricting the experience an ML model can receive. Federated Learning (FL) can solve this issue by allowing ML models to be trained using data from different clients without centralizing data, thus respecting privacy. In this thesis, FL is explored and analyzed, suggesting possible solutions to adapt classic supervised ML techniques, such as normalization and outlier detection, to a distributed context. With the help of existing frameworks, federated training has been set, simulating a real-world scenario where training is performed on the client side, without the need of an ML expert, preserving patients’ privacy. The performances are compared to the ones of the same model trained in a centralized manner to asses whether a federated training can reach the same levels of performance as a centralized one. The federated training is then applied to a real-world dataset, the I3LUNG dataset, where each client represents a center participating in the project. These results can be used for other projects involving medical data coming from different centers with privacy constraints.| File | Dimensione | Formato | |
|---|---|---|---|
|
2024_03_Molteni_Executive_Summary.pdf
accessibile in internet per tutti
Descrizione: Executive summary
Dimensione
489.75 kB
Formato
Adobe PDF
|
489.75 kB | Adobe PDF | Visualizza/Apri |
|
2024_03_Molteni_Thesis.pdf
accessibile in internet per tutti
Descrizione: Tesi
Dimensione
2.4 MB
Formato
Adobe PDF
|
2.4 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/218970