Climate change is an increasingly relevant topic with profound implications for ecosystems and meteorological models. Drought situations, periods of aridity compared to normal local conditions, are among the most concerning consequences of climate change. Monitoring and understanding the causes of these situations is essential to mitigate their impact. In this study, we utilized supervised machine learning models to reconstruct the Vegetation Health Index (VHI), a recently studied satellite-based signal that assesses the health status of vegetation. The models enable the identification of the primary meteorological factors influencing the VHI, using data related to rainfall, temperature, snow, and lakes as inputs. In the context of extreme event detection, data is typically spatio-temporal with various values assumed over time and space. To ensure a reasonable number of variables, we applied new dimensionality reduction methods that allow for interpretable spatial aggregation of meteorological information. The practical application of these approaches enabled their implementation with the addition of empirical constraints to make them more robust and suitable for the problem. Subsequently, the selection of the most informative and non-redundant features was performed through a method based on Conditional Mutual Information (CMI). The models were trained on regression and classification problems, enabling the reconstruction of the VHI signal in both a continuous and a discretized form. Moreover, single-task and multi-task settings have been considered. The multi-task scenario proved to be particularly effective through the sharing of information within clusters of similar basins, especially in regression settings. Among the classification models, the results obtained from the application of a binary model predominated, simplifying the problem by distinguishing between favorable and unfavorable situations for vegetation health. This study focused on the Po River basin as a case study, but aims to create a ML pipeline applicable on a larger scale.
Il cambiamento climatico è un tema di crescente rilevanza con profonde implicazioni per gli ecosistemi e i modelli meteorologici. Le situazioni di siccità, periodi di aridità rispetto alle normali condizioni locali, sono tra le conseguenze più preoccupanti del cambiamento climatico. Monitorare e comprendere cosa causa l’insorgenza di queste situazioni è essenziale per mitigarne l’impatto. In questo studio, abbiamo utilizzato modelli di apprendimento supervisionato per la ricostruzione del Vegetation Health Index (VHI), un segnale derivato da dati satellitari recentemente studiato che valuta lo stato di salute della vegetazione. I modelli consentono l’identificazione dei principali fattori metereologici che influenzano il VHI, utilizzando dati relativi a pioggia, temperatura, neve e laghi come input. Nel contesto di rilevamento di eventi estremi, i dati sono solitamente di tipo spazio-temporale con diversi valori assunti nel tempo e nello spazio. Per garantire un numero ragionevole di variabili, abbiamo applicato nuovi metodi di dimensionality reduction che consentono di aggregare le informazioni metereologiche in modo interpretabile. L’utilizzo pratico di questi approcci ha permesso una loro implementazione, con l’aggiunta di vincoli empirici per renderli più robusti e adatti al problema. Successivamente, la selezione delle variabili più informative e non ridondanti è stata effettuata attraverso un metodo basato sulla Mutua Informazione Condizionata. I modelli sono stati addestrati su problemi di regressione e classificazione, consentendo la ricostruzione del segnale VHI sia in forma continua che discretizzata in classi. I modelli sono stati addestrati in contesti single-task e multi-task. Il caso multi-task si è rivelato particolarmente efficace, tramite la condivisione di informazioni in cluster di bacini simili. Tra i modelli di classifcazione prevalgono i risultati ottenuti dall’applicazione di un modello binario che semplifica il problema distinguendo tra situazioni favorevoli o sfavorevoli alla salute della vegetazione. Questo studio si è concentrato sul bacino del fiume Po, ma mira a creare un processo applicabile su scala più ampia.
Interpretable ML for extreme climate event detection: drought detection in the Po River basin
Cardigliano, Veronica
2022/2023
Abstract
Climate change is an increasingly relevant topic with profound implications for ecosystems and meteorological models. Drought situations, periods of aridity compared to normal local conditions, are among the most concerning consequences of climate change. Monitoring and understanding the causes of these situations is essential to mitigate their impact. In this study, we utilized supervised machine learning models to reconstruct the Vegetation Health Index (VHI), a recently studied satellite-based signal that assesses the health status of vegetation. The models enable the identification of the primary meteorological factors influencing the VHI, using data related to rainfall, temperature, snow, and lakes as inputs. In the context of extreme event detection, data is typically spatio-temporal with various values assumed over time and space. To ensure a reasonable number of variables, we applied new dimensionality reduction methods that allow for interpretable spatial aggregation of meteorological information. The practical application of these approaches enabled their implementation with the addition of empirical constraints to make them more robust and suitable for the problem. Subsequently, the selection of the most informative and non-redundant features was performed through a method based on Conditional Mutual Information (CMI). The models were trained on regression and classification problems, enabling the reconstruction of the VHI signal in both a continuous and a discretized form. Moreover, single-task and multi-task settings have been considered. The multi-task scenario proved to be particularly effective through the sharing of information within clusters of similar basins, especially in regression settings. Among the classification models, the results obtained from the application of a binary model predominated, simplifying the problem by distinguishing between favorable and unfavorable situations for vegetation health. This study focused on the Po River basin as a case study, but aims to create a ML pipeline applicable on a larger scale.File | Dimensione | Formato | |
---|---|---|---|
2023_09_Cardigliano_Executive_summary_02.pdf
accessibile in internet per tutti
Descrizione: executive summary
Dimensione
1.02 MB
Formato
Adobe PDF
|
1.02 MB | Adobe PDF | Visualizza/Apri |
2023_09_Cardigliano_01.pdf
accessibile in internet per tutti
Descrizione: thesis
Dimensione
8.79 MB
Formato
Adobe PDF
|
8.79 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/210589