This study proposes an approach for winter wheat yield mapping based on the integration of crop simulations and machine learning techniques. A large synthetic dataset was generated using the physically based Daisy crop model by varying soil parameters and agronomic management practices across multiple meteorological years, in order to represent a wide range of production conditions. Correlation analysis and hierarchical clustering were applied to perform structured feature selection, reducing the dimensionality of the problem and identifying groups of variables characterized by similar relationships. Random Forest and Symbolic Regression models were then implemented to explore both non-linear ensemble approaches and interpretable symbolic formulations. The relative contribution of predictors within the models was also analyzed to assess their role in explaining yield variability. In parallel, a set of variables realistically retrievable from Earth Observation data was constructed to evaluate the transferability of the approach to an operational context. Results show high predictive performance within the synthetic domain and reduced accuracy when compared with observed data, suggesting the presence of structural differences between the simulated environment and real-world conditions. Overall, the study highlights the potential of integrating crop modelling and machine learning for yield mapping applications, while also emphasizing the challenges associated with transferring calibrated models to operational scenarios.
Questo lavoro propone un approccio per la mappatura della resa del grano invernale basato sull’integrazione tra simulazioni colturali e tecniche di machine learning. Un ampio dataset sintetico è stato generato mediante il modello fisicamente basato Daisy, variando parametri del suolo e della gestione agronomica su più anni meteorologici, al fine di rappresentare un ampio range di condizioni diverse. Attraverso analisi di correlazione e clustering gerarchico è stata effettuata una selezione strutturata dei predittori, riducendo la dimensionalità del problema e individuando gruppi di variabili caratterizzate da relazioni simili. Su tali insiemi sono stati implementati modelli di Random Forest e Symbolic Regression, con l’obiettivo di esplorare sia approcci non lineari ensemble sia formulazioni simboliche interpretabili. È stato inoltre analizzato il peso relativo delle variabili all’interno dei modelli, al fine di evidenziarne il contributo nella spiegazione della resa. Parallelamente, è stato costruito un insieme di variabili realisticamente derivabili da dati di Osservazione della Terra, per valutare la trasferibilità dell’approccio in un contesto operativo. I risultati evidenziano elevate prestazioni nell'ambito del dataset sintetico e una riduzione dell’accuratezza nel confronto con dati osservati, suggerendo la presenza di differenze strutturali tra ambiente modellato e condizioni reali. Nel complesso, lo studio mette in luce le potenzialità dell’integrazione tra modellistica colturale e machine learning per applicazioni di mappatura della resa, evidenziandone al contempo le criticità nel passaggio verso scenari operativi.
Winter wheat yield mapping through the integration of Earth observation data and crop model simulations
KLINGMAN, ALESSANDRO
2024/2025
Abstract
This study proposes an approach for winter wheat yield mapping based on the integration of crop simulations and machine learning techniques. A large synthetic dataset was generated using the physically based Daisy crop model by varying soil parameters and agronomic management practices across multiple meteorological years, in order to represent a wide range of production conditions. Correlation analysis and hierarchical clustering were applied to perform structured feature selection, reducing the dimensionality of the problem and identifying groups of variables characterized by similar relationships. Random Forest and Symbolic Regression models were then implemented to explore both non-linear ensemble approaches and interpretable symbolic formulations. The relative contribution of predictors within the models was also analyzed to assess their role in explaining yield variability. In parallel, a set of variables realistically retrievable from Earth Observation data was constructed to evaluate the transferability of the approach to an operational context. Results show high predictive performance within the synthetic domain and reduced accuracy when compared with observed data, suggesting the presence of structural differences between the simulated environment and real-world conditions. Overall, the study highlights the potential of integrating crop modelling and machine learning for yield mapping applications, while also emphasizing the challenges associated with transferring calibrated models to operational scenarios.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi_Alessandro_Klingman.pdf
accessibile in internet per tutti
Dimensione
3.18 MB
Formato
Adobe PDF
|
3.18 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/253467