Time-conditional generative adversarial networks for augmentation of irregularly sampled time series

In the field of time-series analysis, one of the most common problem is handling of missing values. These values can go missing either for unforeseeable causes during the measurement process or even because this is expected to happen. In the former case, removing the records with incomplete data may well be an available option while, in the latter, it may hinder the construction of predictive models. Therefore, over the years, several techniques for imputing missing data, and a variety of their implementations, have been proposed. These techniques include interpolation and statistical models, such as ARIMA, and their variants. However, there are not many methods that deal with irregularly sampled time-series in their unaltered form. In fact, the previously mentioned strategies can’t be successfully applied to this type of data ’as-is’. In this thesis, three generative approaches based on recent developments in the Deep Learning field will be proposed and compared. The goal is to reconstruct incomplete and irregularly sampled time-series. The introduced methods used throughout the experiments are based on Convolutional Neural Networks, CNN, and Recurrent Neural Networks, RNN. These models have been trained with the framework called GAN which stands for Generative Adversarial Networks. Specifically, the conditioned variants of the networks have been used to generate the missing values. The outcome of this work shows, both qualitatively and quantitatively, how these techniques can capture the dynamics of the time-series and reproduce them correctly.

Nel campo dell’analisi di serie temporali uno dei problemi più comuni è la gestione dei dati mancanti. Questi valori possono venire meno per cause imprevedibili nel processo di misurazione oppure perché la natura del fenomeno in analisi li prevede. Nel primo caso, rimuovere i record che presentano dati incompleti può non impattare sui modelli basati su tali dati. Nel secondo caso, rimuovere queste osservazioni influenzerebbe negativamente la realizzazione di modelli predittivi. Per questo motivo, molte tecniche di imputazione dei dati mancanti sono state proposte negli anni, così come le altrettante loro implementazioni. Queste tecniche includono interpolazione e modelli statistici, come ARIMA, nelle loro declinazioni. Poche sono però le metodologie che trattano serie temporali campionate irregolarmente nella loro forma originale. Di fatto, le strategie riportate prima non possono essere applicate con successo a questo tipo di dati. In questa tesi proponiamo e confrontiamo tre approcci generativi basati sui recenti svi- luppi nell’ambito del Deep Learning per ricostruire serie temporali irregolari incomplete. I metodi che andremo a descrivere ed usare nei nostri esperimenti si basano su Reti Neurali Convoluzionali, CNN, e Reti Neurali Ricorrenti, RNN. Questi modelli sono allenati sfrut- tando il framework chiamato GAN, ovvero, Reti Generative Avversarie. Più precisamente andremo a sfruttare le varianti condizionate di queste reti per produrre i valori persi. I risultati del nostro lavoro mostrano, sia qualitativemente sia quantitativamente, come queste tecniche siano in grado di catturare le dinamiche delle serie temporali e riprodurle correttamente.