The synthesis of broadband seismic waveform records holds the potential to advance various engineering applications in seismology, particularly in seismic monitoring. This is primarily achieved by augmenting existing seismic data catalogues, which are inherently sparse and scarce, with synthetic recordings at non-existent stations and/or for hypothetical events. Recent advances in Generative Artificial Intelligence (GenAI) have led to models designed to estimate data set distributions, showing great promise for capturing the complex dynamics of the propagation of seismic waves through the Earth. These models offer a more efficient and less biased alternative to traditional methods, for instance by minimizing reliance on the accuracy of parameter estimation of fully deterministic models. This thesis presents a novel framework that employs diffusion models to synthesize broadband three-component seismic waveforms, conditioned on hypocentral distance, magnitude, and site conditions. To address the issue of high-amplitude variability in seismic signals, both across and within individual samples, two data representations are compared: one based on the time envelope and the other on the spectrogram. The two resulting models are trained and evaluated on a dataset of strong motion events from Japan, employing both domain-specific metrics and adapted Neural Network-based metrics (neural metrics) from image generation. The results demonstrate that both models are capable of generating visually realistic synthetic waveforms with clear P- and S-wave arrivals that largely align with the distributions of the real data across both time and frequency domains. However, minor artifacts are observed in the power spectral density (PSD) of the generated signals. The intricacies of seismic data sets and their inherent limitations present challenges to model evaluation and comparison with those proposed in the literature. This thesis invites the scientific community to develop a foundation feature extractor, which is also needed to compute the proposed neural metrics -- which represent a promising approach for more consistent, robust, and objective evaluations.
La sintesi di forme d'onda sismiche a banda larga può migliorare diverse applicazioni ingegneristiche in sismologia, in particolare nel monitoraggio sismico. Questo si ottiene integrando dati sintetici nei cataloghi sismici, spesso limitati e poco omogenei, utilizzando registrazioni simulate presso stazioni non esistenti o per eventi ipotetici. I recenti progressi dell'Intelligenza Artificiale Generativa (GenAI) hanno condotto a modelli progettati per stimare le distribuzioni degli insiemi di dati, mostrando grandi promesse per catturare le complesse dinamiche della propagazione delle onde sismiche attraverso la Terra. Questi modelli offrono un'alternativa più efficiente e meno soggetta a distorsioni rispetto ai metodi tradizionali, riducendo la dipendenza dalla precisione dei parametri dei modelli deterministici. Questa tesi impiega modelli di diffusione per sintetizzare forme d'onda sismiche a banda larga a tre componenti, condizionate da distanza ipocentrale, magnitudo e condizioni del sito. Per gestire la variabilità dell'ampiezza dei segnali sismici, sia tra diversi campioni che all'interno di uno stesso campione, vengono comparate due rappresentazioni dei dati: una basata sull'inviluppo temporale e l'altra sullo spettrogramma. I modelli sono stati addestrati e valutati su un dataset di eventi sismici a forte intensità registrati in Giappone, utilizzando metriche specifiche e metriche basate sull reti neurali (metriche neurali) adattate dalla generazione di immagini. I risultati dimostrano che i modelli generano forme d'onda sintetiche visivamente realistiche, con chiari arrivi di onde P e S che corrispondono ampiamente alle distribuzioni reali nei domini di tempo e frequenza. Tuttavia, sono stati osservati alcuni artefatti minori nella densità spettrale di potenza (PSD) dei segnali generati. La complessità dei dati sismici e le loro limitazioni rappresentano una sfida nella valutazione e nel confronto dei modelli. Questa tesi invita la comunità scientifica a sviluppare un estrattore di caratteristiche di base, necessario anche per calcolare le metriche neurali proposte, che rappresentano un approccio promettente per valutazioni più coerenti, robuste e oggettive.
On the synthesis of seismic broadband waveforms with conditional diffusion models
Bosisio, Andrea
2023/2024
Abstract
The synthesis of broadband seismic waveform records holds the potential to advance various engineering applications in seismology, particularly in seismic monitoring. This is primarily achieved by augmenting existing seismic data catalogues, which are inherently sparse and scarce, with synthetic recordings at non-existent stations and/or for hypothetical events. Recent advances in Generative Artificial Intelligence (GenAI) have led to models designed to estimate data set distributions, showing great promise for capturing the complex dynamics of the propagation of seismic waves through the Earth. These models offer a more efficient and less biased alternative to traditional methods, for instance by minimizing reliance on the accuracy of parameter estimation of fully deterministic models. This thesis presents a novel framework that employs diffusion models to synthesize broadband three-component seismic waveforms, conditioned on hypocentral distance, magnitude, and site conditions. To address the issue of high-amplitude variability in seismic signals, both across and within individual samples, two data representations are compared: one based on the time envelope and the other on the spectrogram. The two resulting models are trained and evaluated on a dataset of strong motion events from Japan, employing both domain-specific metrics and adapted Neural Network-based metrics (neural metrics) from image generation. The results demonstrate that both models are capable of generating visually realistic synthetic waveforms with clear P- and S-wave arrivals that largely align with the distributions of the real data across both time and frequency domains. However, minor artifacts are observed in the power spectral density (PSD) of the generated signals. The intricacies of seismic data sets and their inherent limitations present challenges to model evaluation and comparison with those proposed in the literature. This thesis invites the scientific community to develop a foundation feature extractor, which is also needed to compute the proposed neural metrics -- which represent a promising approach for more consistent, robust, and objective evaluations.File | Dimensione | Formato | |
---|---|---|---|
2024_10_Bosisio_Executive_Summary.pdf
accessibile in internet per tutti
Dimensione
3.46 MB
Formato
Adobe PDF
|
3.46 MB | Adobe PDF | Visualizza/Apri |
2024_10_Bosisio_Tesi.pdf
non accessibile
Dimensione
24.79 MB
Formato
Adobe PDF
|
24.79 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/227978