Tropical and subtropical reservoirs are recognized as critical hotspots for Greenhouse Gas (GHG) fluxes. As quantifying these fluxes remains a major challenge, this thesis aims to develop a machine learning (ML) model capable of predicting GHG fluxes employing the Duvert et al. (2025) dataset, enriched with climatological data from ECMWF Reanalysis v5 (ERA5)-Land, Land Use and Land Cover (LULC) characteristics and reservoir morphology data. Exploring multiple architectures, the development of models was initially explored for four distinct flux pathways: diffusive Carbon Dioxide (CO2), diffusive Methane (CH4), bubbling CH4, and diffusive Nitrous Oxide (N2O). However, due to inherent data and feature limitations, a model with moderate predictive power was achieved only for CO2. For CO2, a Random Forest (RF) regressor reached moderate explanatory power (R2 = 0.46 in validation) and a lower Cross-Validation (CV) score (mean CV R2 ≈ 0.24). Feature importance analysis revealed a regional dichotomy in fluxes drivers. In tropical reservoirs, CO2 fluxes were driven by dynamic temporal variables such as average daily air temperature and seasonality. In subtropical regions, fluxes were constrained by static features, such as cropland extent and mean annual wind speed. The addition of reservoir morphology features demonstrated a positive impact on model performance, but was not explored extensively due to data availability constraints (missing from HydroLAKES for 55% of the records). In contrast, predictive modeling for diffusive CH4 failed to generalize across multiple algorithms. Rather than isolating physical or environmental drivers, the models exhibited severe spatial bias, effectively memorizing geographic coordinates. When such coordinates were removed during spatial CV predictive power collapsed, confirming the absence of a process-based signal within the available set of features. Although the CO2 models partially captured the physical transport mechanisms and spatial heterogeneity of reservoir GHG fluxes, the systemic failure to predict CH4 highlights a fundamental limitation in relying solely on climatic and geographic predictors. Ultimately, this study explores both the potential and limitations of integrating LULC and meteorological data for GHG flux predictions. It concludes that future modeling efforts must incorporate high-resolution temporal data (e.g., daily wind speeds) and direct biogeochemical parameters (e.g., Dissolved Oxygen (DO), Total Phosphorus (TP), Chlorophyll-a) to fully resolve the complex localized dynamics of reservoir GHG fluxes.
I bacini idrici tropicali e subtropicali sono riconosciuti come hotspot critici per i flussi di GHG. Poiché la quantificazione di questi flussi rimane una sfida importante, questa tesi mira a sviluppare un modello di ML in grado di prevedere i flussi di GHG utilizzando il dataset di Duvert et al. (2025), arricchito di dati climatologici estratti da ERA5- Land, caratteristiche LULC e morfologia del bacino. Esplorando molteplici architetture, lo sviluppo dei modelli è stato inizialmente analizzato per quattro percorsi di flusso distinti: CO2 diffusivo, CH4 diffusivo, CH4 via ebullizione e N2O diffusivo. Tuttavia, poiché dati e variabili (feature) erano limitati, è stato possibile ottenere un modello con un potere predittivo moderato solo per CO2. Per CO2, il modello RF ha raggiunto un potere esplicativo moderato (R2 = 0.46 in validazione) e un punteggio di CV inferiore (media della CV R2 ≈ 0.24). L’analisi dell’importanza delle variabili (feature importance) ha rivelato una dicotomia regionale nei driver di flusso. Nei bacini tropicali, i flussi di CO2 sono stati guidati da variabili temporali dinamiche, come la temperatura media giornaliera dell’aria e la stagionalità. Nelle regioni subtropicali, i flussi sono risultati vincolati da caratteristiche statiche, come l’estensione dei terreni agricoli e la velocità media annuale del vento. L’aggiunta di variabili relative alla morfologia del bacino ha dimostrato un impatto positivo sulle prestazioni del modello, ma non è stata esplorata ampiamente a causa di limiti nella disponibilità dei dati (mancanti in HydroLAKES per il 55% dei record). Al contrario, la modellazione predittiva per il flusso di CH4 diffusivo non è riuscita a generalizzare attraverso i molteplici algoritmi testati. Invece di isolare i driver fisici o ambientali, i modelli hanno mostrato un grave bias spaziale, memorizzando di fatto le coordinate geografiche. Quando tali coordinate sono state rimosse durante la CV spaziale, il potere predittivo è crollato, confermando l’assenza di un segnale basato sui processi all’interno del set di variabili disponibili. Sebbene i modelli abbiano parzialmente catturato i meccanismi fisici di trasporto e l’ eterogeneità spaziale dei flussi di GHG nei bacini, i bias regionali sistematici e la varianza del modello evidenziano una limitazione fondamentale nell’utilizzo dei soli predittori climatici e geografici. In definitiva, questo studio esplora sia le potenzialità che i limiti dell’integrazione di dati meteorologici e LULC per le previsioni dei flussi di GHG, concludendo che la futura modellazione predittiva dovrà incorporare dati temporali ad alta risoluzione (ad es. la velocità giornaliera del vento) e parametri biogeochimici specifici dei laghi (ad es. DO, TP, Chlorophyll-a) per risolvere completamente le dinamiche localizzate dei flussi di GHG.
Estimating greenhouse gases fluxes from artificial reservoirs using machine learning
PICCINELLI, MARTINA
2024/2025
Abstract
Tropical and subtropical reservoirs are recognized as critical hotspots for Greenhouse Gas (GHG) fluxes. As quantifying these fluxes remains a major challenge, this thesis aims to develop a machine learning (ML) model capable of predicting GHG fluxes employing the Duvert et al. (2025) dataset, enriched with climatological data from ECMWF Reanalysis v5 (ERA5)-Land, Land Use and Land Cover (LULC) characteristics and reservoir morphology data. Exploring multiple architectures, the development of models was initially explored for four distinct flux pathways: diffusive Carbon Dioxide (CO2), diffusive Methane (CH4), bubbling CH4, and diffusive Nitrous Oxide (N2O). However, due to inherent data and feature limitations, a model with moderate predictive power was achieved only for CO2. For CO2, a Random Forest (RF) regressor reached moderate explanatory power (R2 = 0.46 in validation) and a lower Cross-Validation (CV) score (mean CV R2 ≈ 0.24). Feature importance analysis revealed a regional dichotomy in fluxes drivers. In tropical reservoirs, CO2 fluxes were driven by dynamic temporal variables such as average daily air temperature and seasonality. In subtropical regions, fluxes were constrained by static features, such as cropland extent and mean annual wind speed. The addition of reservoir morphology features demonstrated a positive impact on model performance, but was not explored extensively due to data availability constraints (missing from HydroLAKES for 55% of the records). In contrast, predictive modeling for diffusive CH4 failed to generalize across multiple algorithms. Rather than isolating physical or environmental drivers, the models exhibited severe spatial bias, effectively memorizing geographic coordinates. When such coordinates were removed during spatial CV predictive power collapsed, confirming the absence of a process-based signal within the available set of features. Although the CO2 models partially captured the physical transport mechanisms and spatial heterogeneity of reservoir GHG fluxes, the systemic failure to predict CH4 highlights a fundamental limitation in relying solely on climatic and geographic predictors. Ultimately, this study explores both the potential and limitations of integrating LULC and meteorological data for GHG flux predictions. It concludes that future modeling efforts must incorporate high-resolution temporal data (e.g., daily wind speeds) and direct biogeochemical parameters (e.g., Dissolved Oxygen (DO), Total Phosphorus (TP), Chlorophyll-a) to fully resolve the complex localized dynamics of reservoir GHG fluxes.| File | Dimensione | Formato | |
|---|---|---|---|
|
2026_03_Piccinelli.pdf
accessibile in internet per tutti a partire dal 01/03/2027
Descrizione: testo tesi
Dimensione
17.89 MB
Formato
Adobe PDF
|
17.89 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/252494