Adversarial detection using denoising autoencoder

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

Adversarial examples are maliciously crafted inputs to neural networks. These attacks are generated by adding humanly imperceptible changes to images with the goal to fool the model, i.e., cause misclassification. For medical applications, self-driving cars, and facial recognition software these adversarial examples represent potential threats. Adversarial detection methods try to recognize these perturbations. In this thesis one such method is proposed based on Denoising Autoencoders (DAE). The gradient of the energy, also known as the score function of the input image can be estimated from reconstruction of the DAE. The score function of an adversarial input can be relatively larger to the score function of an image used during training. Hence, with an appropriately chosen threshold, the adversarial examples can be detected before being passed to the classifier. This thesis explores conditions favourable for making the DAE robust and also shows that DAE outperforms state-of-the-art (SOTA) baseline method for detection.

Gli Adversarial examples sono input creati maliziosamente per le reti neurali. Questi attacchi sono generati aggiungendo modifiche umanamente impercettibili alle immagini con l'obiettivo di ingannare il modello, causando quindi errori di classificazione. Per applicazioni mediche, auto a guida autonoma e software di riconoscimento facciale, questi esempi contraddittori rappresentano potenziali minacce. I metodi di Adversarial detection cercano di riconoscere queste perturbazioni. In questa tesi viene proposto uno di questi metodi basato sui Denoising Autoencoders (DAE). Il gradiente dell'energia, noto anche come score function dell'immagine di input, può essere stimato dalla ricostruzione del DAE. La score function di un adversial input può essere considerevolmente più grande della score function di un'immagine utilizzata durante il training. Quindi, con una soglia scelta in modo appropriato, gli Adversarial examples possono essere rilevati prima di essere passati al classificatore. Questa tesi esplora le condizioni per rendere il DAE robusto e mostra inoltre che il DAE supera il metodo stato dell'arte (SOTA) per fare detection.

Adversarial detection using denoising autoencoder

ROHRER, CSABA

2018/2019

Abstract

Adversarial examples are maliciously crafted inputs to neural networks. These attacks are generated by adding humanly imperceptible changes to images with the goal to fool the model, i.e., cause misclassification. For medical applications, self-driving cars, and facial recognition software these adversarial examples represent potential threats. Adversarial detection methods try to recognize these perturbations. In this thesis one such method is proposed based on Denoising Autoencoders (DAE). The gradient of the energy, also known as the score function of the input image can be estimated from reconstruction of the DAE. The score function of an adversarial input can be relatively larger to the score function of an image used during training. Hence, with an appropriately chosen threshold, the adversarial examples can be detected before being passed to the classifier. This thesis explores conditions favourable for making the DAE robust and also shows that DAE outperforms state-of-the-art (SOTA) baseline method for detection.

Scheda breve

Scheda completa

	Relatore
	
				DI NITTO, ELISABETTA
			
	Correlatore/i
	
				MUELLER, KLAUS-ROBERT
NAKAJIMA, SHINICHI
SAMEK, WOJCIECH
SRINIVASAN, VIGNESH
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				29-apr-2020
			
	Anno accademico
	
				2018/2019
			
	Abstract in italiano
	
				Gli Adversarial examples sono input creati maliziosamente per le reti neurali. 
Questi attacchi sono generati aggiungendo modifiche umanamente impercettibili alle immagini con l'obiettivo di ingannare il modello, causando quindi errori di classificazione.
Per applicazioni mediche, auto a guida autonoma e software di riconoscimento facciale, questi esempi contraddittori rappresentano potenziali minacce.
I metodi di Adversarial detection cercano di riconoscere queste perturbazioni. 
In questa tesi viene proposto uno di questi metodi basato sui Denoising Autoencoders (DAE).
Il gradiente dell'energia, noto anche come score function dell'immagine di input, può essere stimato dalla ricostruzione del DAE. 
La score function di un adversial input può essere considerevolmente più grande della score function di un'immagine utilizzata durante il training.
Quindi, con una soglia scelta in modo appropriato, gli Adversarial examples possono essere rilevati prima di essere passati al classificatore.
Questa tesi esplora le condizioni per rendere il DAE robusto e mostra inoltre che il DAE supera il metodo stato dell'arte (SOTA) per fare detection.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
Master_Thesis_Italian.pdf non accessibile Descrizione: Master Thesis Dimensione 3.37 MB Formato Adobe PDF Visualizza/Apri	3.37 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/164883