Deep learning in multi-step forecasting of chaotic dynamics

In the last few decades, many attempts to forecast the evolution of chaotic systems, and to discover how far into the future they can be predicted, have been done adopting a wide range of models. Some early attempts were performed in the 80s and 90s, but the topic became more and more debated in recent years, due to the development of lots of machine learning techniques in the field of time series analysis and prediction. Forecasting chaotic dynamics one or a few steps ahead is usually an easy task, as demonstrated by the high performances obtained on many systems, both time-continuous and discrete. The situation dramatically changes when considering a longer horizon because infinitesimal errors lead to a completely different evolution of the system even when one knows the actual model of the chaotic system. One of the most widely used prediction tools are artificial neural networks, which can be divided into those that present a feed-forward and fully-connected structure and those that include recurrent neurons. The first are static approximators capable of reproducing the relation between input and output, in principle with arbitrary accuracy. When adopting these models, the forecasting of a chaotic time series over a multi-step horizon is commonly done by recursively performing one-step-ahead predictions (recursive predictor). A possible alternative consists of training the model to directly compute multiple outputs (multi-output predictor), each representing the prediction at a specific time step in the future. Both the forecasting methods have their weakness. The recursive one is optimized only to predict one step into the future. Thus its performance is not guaranteed on mid-long-terms, in particular, when considering chaotic dynamics. The multi-output predictor takes into account the whole forecasting horizon: each neuron in the output layer focuses on the forecast of the considered variable at a different time step. The main issue with this architecture is that we are not able to specify that the outputs are sequential (i.e., the same variable at different time steps). The model acts as if the outputs were independent variables, rather than the same variable sampled at subsequent steps. In addition, the mapping between input and output becomes complex when taking into account a high number of steps ahead. To overcome these critical aspects, it is necessary to adopt a neural model that is able to deal with the temporal dynamics of the interesting variable (or many variables): the recurrent neural networks (RNNs). Recurrent neurons (LSTM cell) have been demonstrated to be efficient when used as basic blocks to build up sequence to sequence architectures. This kind of structure represents the state-of-the-art approach in many sequence tasks (e.g., natural language processing). The RNNs are almost always trained using a technique known as teacher forcing. It consists of using the target values as the input for each time step, rather than the output predicted by the network at the previous step. It has been demonstrated that this technique is necessary when considering tasks related to natural language processing, and it is currently always adopted even in numerical time series prediction. Training with teacher forcing does not allow the network to correct small errors because, during the training phase, the prediction at a particular time step does not affect subsequent predictions. In principle, this can lead to a situation that is somehow similar to that of the feed-forward recursive predictor. We thus proposed to adopt a recurrent architecture and to train it without teacher forcing. Coupling these two elements solves at the same time the drawbacks of the recursive, multi-output predictors, and LSTM with teacher forcing. First, this structure is trained to reproduce the entire set of output variables. Second, it explicitly takes into account that these outputs represent the same variable computed at consecutive time steps. Third, small prediction errors propagate along the predicted sequence during training, and thus the training process could be able to correct them. We tested the capability of the neural predictors on four well-known chaotic systems: the logistic and the Hénon maps, the prototypes of chaos in non-reversible and reversible systems, and two generalized Hénon maps, a low- and a high-dimensional case of hyperchaos. First, the predictors have been trained on noise-free data generated by chaotic oscillators, without taking advantage of any physical knowledge on the systems. The obtained results show that LSTM nets trained without teacher forcing are able to efficiently couple the strengths of all the benchmark competitors, and provide the best performances in terms of predictive power on all the considered chaotic attractors. The results are robust because the predictors rank in the same way in all the chaotic systems. We also proved that LSTM architectures are more robust than the feed-forward nets even when a redundant number of time lags is included in the input. In order to better mimic practical applications, we introduced additive white Gaussian noise on the signals obtained simulating the deterministic systems. The absence of noise is an ideal condition that is never verified when considering practical applications. A sensitivity analysis considering different levels of noise has been performed. As expected, the performances are considerably worse than those obtained in the noise-free case due to the chaotic behavior of the considered systems, which exponentially amplify the noise on the initial condition. This analysis confirms the ranking already obtained in the noise-free case: LSTM nets trained without teacher forcing turn out to be the best performing architecture. Another test takes into account a modified logistic map, with a slow-varying growth rate (i.e., the logistic parameter). Testing the predictors on a slow-fast system is interesting because the forecasting task requires to retain information about both the slow-varying context (long-term memory) and the fast dynamics of the logistic map. Again, the recurrent structure of the LSTM nets provides better predictive accuracy than feed-forward ones due to the LSTMs dynamic nature: they have an internal memory, and the values of their gates change at each step. At last, we consider two real-world applications: solar irradiance measured in Como, and the ozone concentration in Chiavenna, Northern Italy. Both the time series exhibit a chaotic behavior (positive largest Lyapunov exponent) and thus represent appropriate case studies. In general, the results confirm that the LSTM without teacher forcing outperforms the competitors. However, the ranking seems to be more system-dependent than that obtained with the artificial datasets. For instance, the LSTM with teacher forcing provides the worst performance on the solar irradiance dataset. Another interesting result is that the feed-forward multi-output net reaches comparable (though still worse) forecasting accuracy of LSTM without teacher forcing in both the time series. Besides the accuracy of the forecasted values, another essential characteristic of the forecasting models is their generalization capability, often mentioned as domain adaptation in the neural nets literature. It means the possibility of storing knowledge gained while solving one task and applying it to different, though similar, datasets. To test this feature, the neural networks developed to predict the solar irradiance at the Como station (source domain) have been used, without retraining, on other sites (target domains) spanning more than one degree of latitude and representing quite different geographical settings. The neural networks developed in our study have proved to be able to forecast solar radiation in other stations with a minimal loss of precision.

Negli ultimi decenni sono stati effettuati molti tentativi di predire l'evoluzione di sistemi caotici e di scoprire quanto nel futuro possano essere predetti, utilizzando una grande varietà di modelli. I primi tentativi risalgono agli anni '80 e '90, ma il tema è divenuto di grande interesse soprattutto in anni recenti, a causa dello sviluppo di molte tecniche di machine learning nell'ambito dell'analisi e della predizione di serie temporali. Predire una dinamica caotica per uno o pochi passi in avanti è solitamente un compito semplice, come dimostrato dalle elevate prestazioni ottenute su molti sistemi, sia a tempo continuo che discreto. La situazione cambia drasticamente quando si considera un orizzonte predittivo esteso, poiché errori infinitesimi portano ad evoluzioni del sistema totalmente differenti, anche quando è noto con certezza il modello del sistema caotico. Una delle tecniche predittive più diffusa utilizza le reti neurali artificiali, tradizionalmente suddivise tra quelle con una struttura feed-forward e fully connected, e quelle che includono neuroni ricorrenti. Le prime sono approssimatori statici in grado di replicare, teoricamente con un grado di accuratezza arbitrario, la relazione tra ingresso e uscita. Utilizzando modelli di questo tipo, la predizione di una serie temporale caotica su un orizzonte a più passi viene solitamente compiuta effettuando in modo ricorsivamente delle predizioni ad un singolo passo (predittore ricorsivo). Una possibile alternativa consiste nell'allenare il modello a calcolare direttamente molteplici uscite (predittore multi-output), ciascuna delle quali rappresenta la predizione ad uno specifo istante futuro. Entrambi gli approcci presentano alcune debolezze. Quello ricorsivo è ottimizzato solo per la predizione ad un passo in avanti; per questo la sua performance non è garantita nel medio e lungo periodo in particolare quando si ha a che fare con dinamiche caotiche. Il predittore multi-output considera l'intero orizzonte predittivo: ogni neurone dell'output layer è dedicato alla predizione della variabile in esame ad un diverso passo temporale. Il problema principale di quest'architettura è che non siamo in grado di specificare che gli output sono sequenziali (la stessa variabile a diversi istanti temporali). Il modello agisce come se le uscite fossero variabili indipendenti, più che la stessa variabile campionata ad istanti successivi. Inoltre, la funzione che mappa l'ingresso nell'uscita diventa complessa quando si considerano molti passi in avanti. Per superare queste criticità, è necessario adottare modelli neurale in grado di descrivere la dinamica temporale della variabile (o delle variabili) di interesse: le reti neurali ricorrenti (RNN). I neuroni ricorrenti (come le LSTM cell) si sono dimostrati efficienti se utilizzati per costruire architetture sequence to sequence. Questa tipologia di struttura rappresenta lo stato dell'arte in molti task sequenziali, in particolare quelli legati all'elaborazione del linguaggio naturale. Le reti ricorrenti sono quasi sempre allenate con una tecnica nota come teacher forcing. Essa consiste nell'utilizzare il valore target come input per ciascun passo temporale, al posto che l'uscita predetta dalla rete all'istante temporale precedente. Questa tecnica è ritenuta necessaria per trattare task inerenti all'elaborazione del linguaggio naturale, e attualmente è sempre utilizzata anche nella predizione di serie temporali numeriche. L'allenamento con teacher forcing non permette alla rete di correggere piccoli errori perché, durante la fase di training, la predizione ad un particolare passo non influenza le predizioni successive. In linea di principio, questo fatto genera una situazione in qualche modo analoga a quella di un predittore feed-forward ricorsivo. Proponiamo quindi di utilizzare un'architettura ricorrente allenata senza teacher forcing. L'unione di questi due elementi risolve contemporaneamente le criticità del predittore ricorsivo, di quello multi-output, e della LSTM allenata con teacher forcing. Da un lato, questa struttura è allenata per calcolare l'intera sequenza di output. Dall'altro, considera esplicitamente che le uscite sono la stessa variabile campionata a passi temporali successivi. In più, piccoli errori di predizione sono propagati lungo la sequenza calcolata dal modello durante l'allenamento, e per questo il processo di training potrebbe essere in grado di correggerli. La precisione dei predittori neurali è stata testata su quattro sistemi caotici ben noti: la mappa logistica e di Hénon, rispettivamente prototipi del caos in sistemi non reversibili e reversibili, e due mappe di Hénon generalizzate, casi di ipercaos rispettivamente a bassa e alta dimensionalità. Un primo test è stato effettuato allenando i predittori su dati senza rumore generati dagli oscillatori caotici, senza utilizzare in alcun modo la conoscenza del sistema fisco. I risultati ottenuti mostrano che le reti LSTM allenate senza teacher forcing sono in grado di fondere i punti di forza di tutti i concorrenti considerati, e forniscono le migliori performance in termini di potere predittivo su tutti gli attrattori caotici esaminati. I risultati sono robusti in quanto i predittori seguono lo stesso ordinamento in tutti i sistemi caotici. Le architetture LSTM si sono rivelate anche più robuste delle reti feed-forward quando gli è stato fornito in input un numero ridondante di passi temporali. Al fine di è ottenere delle analisi più significative dal punto di vista pratico, un rumore bianco e gaussiano è stato aggiunto ai segnali ottenuti simulando i sistemi deterministici. L'assenza di rumore è un condizione ideale che non si verifica mai nelle applicazione pratiche. É stata quindi effettuata un'analisi di sensitività considerando diversi livelli di rumore. Come previsto, le prestazioni sono sensibilmente peggiori rispetto a quelle ottenute nel caso deterministico per via del comportamento caotico dei sistemi considerati, che amplificano in maniera esponenziale il rumore sulle condizioni iniziali. Quest'analisi conferma l'ordinamento già ottenuto nel caso di assenza di rumore: la rete LSTM allenata senza teacher forcing si dimostra essere l'architettura più performante. Un ulteriore test effettuato considera una versione modificata della mappa logistica, che presenta un tasso di crescita (il parametro della logistica) con una lenta variabilità. Sperimentare i predittori su un sistema lento-veloce è interessante poiché questo problema richiede di conservare sia l'informazione sul contesto che varia lentamente (memoria a lungo termine), che quella sulla dinamica veloce tipica della mappa logistica. Anche in questo caso, la struttura ricorrente della rete LSTM garantisce una miglior accuratezza rispetto a quella feed-forward a causa della natura dinamica delle LSTM, che hanno una memoria interna e delle strutture di controllo, i gate, che variano ad ogni passo. Infine, sono state considerate due applicazioni reali: la radiazione solare misurata a Como e la concentrazione di ozono a Chiavenna, nell'Italia settentrionale. Entrambe le serie temporali mostrano un comportamento caotico (massimo esponente di Lyapunov positivo) e quindi rappresentano due casi di studio appropriati. In generale i risultati confermano che le LSTM allenate senza teacher forcing forniscono prestazioni migliori rispetto ai predittori concorrenti. Tuttavia, l'ordinamento sembra essere maggiormente dipendente dal sistema considerato rispetto a quello ottenuto con dataset artificiali. Ad esempio, la rete LSTM allenata con teacher forcing fornisce la peggior performance sui dati di radiazione solare. Un altro risultato interessante è che la rete feed-forward multi-output raggiunge un'accuratezza comparabile (anche se comunque peggiore) della LSTM allenata senza teacher forcing su entrambe le serie temporali. Oltre all'accuratezza della previsione, un'altra caratteristica essenziale dei modelli predittivi è la loro capacità di generalizzazione, spesso chiamata domain adaptation nella letteratura relativa alle reti neurali. Essa denota la possibilità di immagazzinare la conoscenza appresa risolvendo un certo problema, per poi applicarla ad un dataset diverso (anche se simile). Per testare questa capacità, le reti neurali sviluppate per predire la radiazione solare a Como (source domain) sono stato utilizzate, senza essere nuovamente allenate, su altri siti (target domain) che coprono più di un grado di latitudine e con caratteristiche geografiche abbastanza eterogenee. Le reti neurali identificati in questa tesi si sono dimostrate in grado di predire la radiazione solare nelle altre stazioni con una perdita di precisione limitata.