Ground-level carbon monoxide (CO) poses significant public health and climate risks, yet global monitoring networks are critically sparse, with over 95% of stations concentrated in high-income countries. This disparity creates a significant data gap, hindering effective air quality management in underrepresented regions. To address this challenge, our study develops a scalable, data-driven framework for estimating daily ground-level CO concentrations by integrating multi-source remote sensing and model data. We focus on the Metropolitan City of Milan (MCM) as a case study, leveraging Sentinel-5P observations, CAMS reanalysis, and ERA5 meteorological variables from January 2019 to November 2024. Our methodology rigorously evaluates multiple machine learning models, including Dense Attention Network (DAN), a tailored deep learning model, across various temporal windows to capture the diurnal cycle of CO accumulation and dispersion. Extensive feature engineering, such as boundary layer height normalization and incorporating lagged meteorological variables, proved critical for model performance. Our results identify the 21:00-15:00 GMT+1 window (spanning 18 hours before 15:00 GMT+1) as optimal, effectively capturing nighttime accumulation, morning traffic peaks, and daytime dilution processes. The proposed DAN model achieved the best performance, with a normalized root mean squared error of 0.4879 ± 0.0252 upon robust validation with 20 independent shuffle-splits, demonstrating superior accuracy and stability over ensemble and traditional methods. This research provides a cost-effective and transferable solution for urban CO monitoring in data-scarce environments. The framework offers strong interpretability through SHAP analysis and establishes a foundation for future work, including expansion to multi-pollutant estimation, spatiotemporal generalization to other regions, and the integration of additional data sources like traffic and urban morphology to further enhance predictive accuracy and operational utility for environmental policy and public health protection.
Il monossido di carbonio (CO) a livello del suolo rappresenta un rischio rilevante per la salute pubblica e per il clima, ma le reti di monitoraggio globali sono estremamente limitate: oltre il 95% delle stazioni si concentra nei paesi ad alto reddito. Questa disparità genera un grave gap informativo, ostacolando una gestione efficace della qualità dell’aria nelle regioni sotto-rappresentate. Per affrontare tale sfida, il nostro studio propone un quadro scalabile, basato sull’integrazione di dati multisorgente da telerilevamento e modelli, per stimare le concentrazioni giornaliere di CO a livello del suolo. Come caso di studio, ci focalizziamo sulla Città Metropolitana di Milano (MCM), utilizzando osservazioni Sentinel-5P, rianalisi CAMS e variabili meteorologiche ERA5, per il periodo gennaio 2019 – novembre 2024. La metodologia valuta diversi modelli di machine learning, incluso il Dense Attention Network (DAN), un modello di deep learning specificamente sviluppato, e considera differenti finestre temporali per catturare il ciclo diurno di accumulo e dispersione del CO. L’attività di feature engineering, comprendente la normalizzazione dell’altezza dello strato limite e l’uso di variabili meteorologiche ritardate, è risultata determinante per le prestazioni del modello. I risultati identificano la finestra 21:00–15:00 GMT+1 (18 ore precedenti alle 15:00) come ottimale, poiché cattura l’accumulo notturno, i picchi di traffico mattutini e i processi di diluizione diurni. Il modello DAN ha ottenuto le migliori prestazioni, con un errore quadratico medio normalizzato di 0.4879 ± 0.0252, validato tramite 20 esperimenti shuffle-split indipendenti, dimostrando maggiore accuratezza e stabilità rispetto ai metodi tradizionali ed ensemble. Questo studio offre una soluzione economica e trasferibile per il monitoraggio urbano del CO in contesti con carenza di dati. Il quadro proposto garantisce interpretabilità tramite analisi SHAP e costituisce una base solida per sviluppi futuri, quali stima multi-inquinante, generalizzazione spaziotemporale ad altre aree e integrazione di ulteriori dati (traffico, morfologia urbana) per migliorare ulteriormente l’accuratezza predittiva e l’utilità operativa a supporto delle politiche ambientali e della tutela della salute pubblica.
Estimating ground-level carbon monoxide concentrations using machine learning techniques: the metropolitan city of Milan case study
LIANG, ZHONGYOU
2024/2025
Abstract
Ground-level carbon monoxide (CO) poses significant public health and climate risks, yet global monitoring networks are critically sparse, with over 95% of stations concentrated in high-income countries. This disparity creates a significant data gap, hindering effective air quality management in underrepresented regions. To address this challenge, our study develops a scalable, data-driven framework for estimating daily ground-level CO concentrations by integrating multi-source remote sensing and model data. We focus on the Metropolitan City of Milan (MCM) as a case study, leveraging Sentinel-5P observations, CAMS reanalysis, and ERA5 meteorological variables from January 2019 to November 2024. Our methodology rigorously evaluates multiple machine learning models, including Dense Attention Network (DAN), a tailored deep learning model, across various temporal windows to capture the diurnal cycle of CO accumulation and dispersion. Extensive feature engineering, such as boundary layer height normalization and incorporating lagged meteorological variables, proved critical for model performance. Our results identify the 21:00-15:00 GMT+1 window (spanning 18 hours before 15:00 GMT+1) as optimal, effectively capturing nighttime accumulation, morning traffic peaks, and daytime dilution processes. The proposed DAN model achieved the best performance, with a normalized root mean squared error of 0.4879 ± 0.0252 upon robust validation with 20 independent shuffle-splits, demonstrating superior accuracy and stability over ensemble and traditional methods. This research provides a cost-effective and transferable solution for urban CO monitoring in data-scarce environments. The framework offers strong interpretability through SHAP analysis and establishes a foundation for future work, including expansion to multi-pollutant estimation, spatiotemporal generalization to other regions, and the integration of additional data sources like traffic and urban morphology to further enhance predictive accuracy and operational utility for environmental policy and public health protection.| File | Dimensione | Formato | |
|---|---|---|---|
|
LIANGZHONGYOU231029_Estimating Ground-Level Carbon Monoxide Concentrations Using Machine Learning Techniques: The Metropolitan City of Milan Case Study.pdf
accessibile in internet per tutti
Descrizione: Thesis text
Dimensione
6.02 MB
Formato
Adobe PDF
|
6.02 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/243020