Machine Learning (ML) explainability is becoming an increasingly importantresearch topic. However, popular ML explainability approaches are not robust.In this thesis, I adversarially train neural networks to manipulate a number ofwidely-used explanation methods. A single fine-tuned model is able to manipulateexplanation methods such as Gradient, Gradient times input, Integrated gradi-ents, Layer-wise Relevance Propagation (LRP) and Occlusion across almost anyinput. I show how detecting manipulations is a challenging task and why furtherdevelopment of robust explanation methods is critical.
La spiegabilita del Machine Learning (ML) sta diventando un argomento di ricercasempre pi ́uimportante. Tuttavia, i popolari approcci di spiegabilit ́aML non sonorobusti. In questa tesi, addestro avversariamente le reti neurali per manipolareuna serie di metodi di spiegazione ampiamente utilizzati. Un singolo modello per-fezionato ́e in grado di manipolare metodi di spiegazione come il Gradient, Gradient times input, Integrated gradients, Layer-wise Relevance Propagation (LRP) and Occlusion attraverso quasi tutti gli input. Mostro come il rilevamento dellemanipolazioni sia un compito impegnativo e perche l'ulteriore sviluppo di metodidi spiegazione robusti sia fondamentale.
Training neural networks with manipulated explanations
PASLIEV, PLAMEN
2018/2019
Abstract
Machine Learning (ML) explainability is becoming an increasingly importantresearch topic. However, popular ML explainability approaches are not robust.In this thesis, I adversarially train neural networks to manipulate a number ofwidely-used explanation methods. A single fine-tuned model is able to manipulateexplanation methods such as Gradient, Gradient times input, Integrated gradi-ents, Layer-wise Relevance Propagation (LRP) and Occlusion across almost anyinput. I show how detecting manipulations is a challenging task and why furtherdevelopment of robust explanation methods is critical.File | Dimensione | Formato | |
---|---|---|---|
ITALIAN_Training_NN_with_manipulated_explanations.pdf
accessibile in internet per tutti
Descrizione: Full thesis
Dimensione
3.82 MB
Formato
Adobe PDF
|
3.82 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/154039