Adversarial examples are maliciously crafted inputs to neural networks. These attacks are generated by adding humanly imperceptible changes to images with the goal to fool the model, i.e., cause misclassification. For medical applications, self-driving cars, and facial recognition software these adversarial examples represent potential threats. Adversarial detection methods try to recognize these perturbations. In this thesis one such method is proposed based on Denoising Autoencoders (DAE). The gradient of the energy, also known as the score function of the input image can be estimated from reconstruction of the DAE. The score function of an adversarial input can be relatively larger to the score function of an image used during training. Hence, with an appropriately chosen threshold, the adversarial examples can be detected before being passed to the classifier. This thesis explores conditions favourable for making the DAE robust and also shows that DAE outperforms state-of-the-art (SOTA) baseline method for detection.
Gli Adversarial examples sono input creati maliziosamente per le reti neurali. Questi attacchi sono generati aggiungendo modifiche umanamente impercettibili alle immagini con l'obiettivo di ingannare il modello, causando quindi errori di classificazione. Per applicazioni mediche, auto a guida autonoma e software di riconoscimento facciale, questi esempi contraddittori rappresentano potenziali minacce. I metodi di Adversarial detection cercano di riconoscere queste perturbazioni. In questa tesi viene proposto uno di questi metodi basato sui Denoising Autoencoders (DAE). Il gradiente dell'energia, noto anche come score function dell'immagine di input, può essere stimato dalla ricostruzione del DAE. La score function di un adversial input può essere considerevolmente più grande della score function di un'immagine utilizzata durante il training. Quindi, con una soglia scelta in modo appropriato, gli Adversarial examples possono essere rilevati prima di essere passati al classificatore. Questa tesi esplora le condizioni per rendere il DAE robusto e mostra inoltre che il DAE supera il metodo stato dell'arte (SOTA) per fare detection.
Adversarial detection using denoising autoencoder
ROHRER, CSABA
2018/2019
Abstract
Adversarial examples are maliciously crafted inputs to neural networks. These attacks are generated by adding humanly imperceptible changes to images with the goal to fool the model, i.e., cause misclassification. For medical applications, self-driving cars, and facial recognition software these adversarial examples represent potential threats. Adversarial detection methods try to recognize these perturbations. In this thesis one such method is proposed based on Denoising Autoencoders (DAE). The gradient of the energy, also known as the score function of the input image can be estimated from reconstruction of the DAE. The score function of an adversarial input can be relatively larger to the score function of an image used during training. Hence, with an appropriately chosen threshold, the adversarial examples can be detected before being passed to the classifier. This thesis explores conditions favourable for making the DAE robust and also shows that DAE outperforms state-of-the-art (SOTA) baseline method for detection.File | Dimensione | Formato | |
---|---|---|---|
Master_Thesis_Italian.pdf
non accessibile
Descrizione: Master Thesis
Dimensione
3.37 MB
Formato
Adobe PDF
|
3.37 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/164883