A comparison of deep learning based anomaly detection techniques in multivariate time series for industrial application

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

In the era of Big Data, machines are more and more connected and the flow of data between them and the sensors that control them, allows for an abundance of available data. We aim to use these vast amounts of available data and extract useful information, making it possible to reduce business costs, optimize capacity, and keep systems downtime to a minimum. During their lifecycle, all these automated systems could behave in such a way that differs from the ”normal” functioning. As Chandola et al. [9]. say, ”anomaly detection aims to identify those regions from data whose behaviours or patterns do not conform to expected values”. Here, I will present a comparison between the state of the art deep and non-deep learning based anomaly detection techniques, applied to multivari- ate time series data in industrial applications. To perform the comparison I run the algorithms over 2 datasets and checked the results through known metrics like ROC Curve, AUC, F1-score, precision and recall. I noted how, when the anomalous points are less frequent, the performance of the non- deep learning algorithms drops, while some deep-learning based algorithms - in particular the ones based on an encoder-decoder architecture - are still able to identify the anomalies in a non-naive way. There are still improvements needed in this field, like for example the development of some explainability approaches to be able to give consistent motivations about the obtained results.

Nell’era dei Big Data, i macchinari industriali sono sempre più connessi tra loro e il flusso di dati proveniente sia dai sistemi che dai sensori che li monitorano è sempre più importante. L’obiettivo è riuscire ad utilizzare questa enorme mole di dati per estrarne informazioni utili, ridurre i costi di business in termini di soldi e tempo. Nel loro periodo di vita i sistemi possono comportarsi in modi che differiscono dalla normale operatività. In questa tesi presenterò una comparazione delle tecniche allo stato dell’arte sia basate sul deep-learning che non, applicate a serie del tempo multivariate per le applicazioni industriali. Ho eseguito gli algoritmi su due dataset e ho comparato i risultati tramite le metriche più note quali, precisione, recall, f1-score, ROC Curve e AUC. Ho notato come quando le anomalie diventano sempre più rare gli algoritmi non basati sul deep-learning tendono a crollare in termini di performance, mentre quelli basati sul deep-learning, in particolare quelli che hanno un’architettura encoder-decoder riescono a dare risultati non banali. Ci sono ancora punti di miglioramento in futuro in questo campo, per esempio per quanto riguarda l’explainability dei risultati, cioè il saper motivare la decisione presa dall’algoritmo.

A comparison of deep learning based anomaly detection techniques in multivariate time series for industrial application

COLLINI, FILIPPO

2018/2019

Abstract

In the era of Big Data, machines are more and more connected and the flow of data between them and the sensors that control them, allows for an abundance of available data. We aim to use these vast amounts of available data and extract useful information, making it possible to reduce business costs, optimize capacity, and keep systems downtime to a minimum. During their lifecycle, all these automated systems could behave in such a way that differs from the ”normal” functioning. As Chandola et al. [9]. say, ”anomaly detection aims to identify those regions from data whose behaviours or patterns do not conform to expected values”. Here, I will present a comparison between the state of the art deep and non-deep learning based anomaly detection techniques, applied to multivari- ate time series data in industrial applications. To perform the comparison I run the algorithms over 2 datasets and checked the results through known metrics like ROC Curve, AUC, F1-score, precision and recall. I noted how, when the anomalous points are less frequent, the performance of the non- deep learning algorithms drops, while some deep-learning based algorithms - in particular the ones based on an encoder-decoder architecture - are still able to identify the anomalies in a non-naive way. There are still improvements needed in this field, like for example the development of some explainability approaches to be able to give consistent motivations about the obtained results.

Scheda breve

Scheda completa

	Relatore
	
				CARMAN, MARK JAMES
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				6-giu-2020
			
	Anno accademico
	
				2018/2019
			
	Abstract in italiano
	
				Nell’era dei Big Data, i macchinari industriali sono sempre più connessi tra loro e il flusso di dati proveniente sia dai sistemi che dai sensori che li monitorano è sempre più importante. L’obiettivo è riuscire ad utilizzare questa enorme mole di dati per estrarne informazioni utili, ridurre i costi di business in termini di soldi e tempo.

Nel loro periodo di vita i sistemi possono comportarsi in modi che differiscono dalla normale operatività.

In questa tesi presenterò una comparazione delle tecniche allo stato dell’arte sia basate sul deep-learning che non, applicate a serie del tempo multivariate per le applicazioni industriali. Ho eseguito gli algoritmi su due dataset e ho comparato i risultati tramite le metriche più note quali, precisione, recall, f1-score, ROC Curve e AUC. Ho notato come quando le anomalie diventano sempre più rare gli algoritmi non basati sul deep-learning tendono a crollare in termini di performance, mentre quelli basati sul deep-learning, in particolare quelli che hanno un’architettura encoder-decoder riescono a dare risultati non banali.

Ci sono ancora punti di miglioramento in futuro in questo campo, per esempio per quanto riguarda l’explainability dei risultati, cioè il saper motivare la decisione presa dall’algoritmo.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
Collini-Filippo_Final-Thesis.pdf accessibile in internet solo dagli utenti autorizzati Descrizione: Testo della tesi Dimensione 22.68 MB Formato Adobe PDF Visualizza/Apri	22.68 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/165282