This thesis focuses on the deployment of Deep Learning (DL) models for malware de- tection. It addresses the challenges these models face, including their struggle to keep pace with the rapid evolution of malware and their vulnerabilities to adversarial attacks. A significant emphasis is placed on the impact of dataset bias. This research explores how the construction and composition of training datasets influence the performance and resilience of DL-based malware detection models. Against the conventional belief in the necessity of extensive adversarial training and large datasets, this thesis demonstrates that the construction of training datasets, unbiased both spatially and temporally, can significantly improve the robustness and effectiveness of these models, highlighting the importance of adapting training methodologies to better reflect the mutating nature of malware. Adversarial attacks represent a significant threat to the integrity and reliability of malware detection systems, exploiting weaknesses in models to evade detection. Our findings illustrate that models trained on unbiased datasets can withstand such attacks, showcasing unexpected robustness. Another innovative aspect of our research is the ex- amination of the trade-offs between false positive rates (FPR) and model resilience. In particular, this research demonstrates that by slightly relaxing the FPR criteria, we not only preserve the core detection capabilities of our models but also significantly enhance their resilience to adversarial attacks.
Questa tesi si dedica allo sviluppo di modelli di Deep Learning (DL) per l’identificazione di malware, esaminando le sfide che questi modelli affrontano, come la loro lotta nel tenere il passo con l’evoluzione veloce dei malware e le loro vulnerabilità agli attacchi avversari. Si pone un’enfasi particolare sull’impatto nel dataset. Il nostro studio investiga il modo in cui la struttura e la composizione dei dataset di addestramento influiscono sulla per- formance e sulla capacità di resistenza dei modelli DL nel riconoscimento dei malware. Differendo dalla convinzione comune che vede l’addestramento su vasti dataset e con esempi avversari come indispensabile, questa tesi rivela che la creazione di dataset di ad- destramento, unbiased sia in termini di spazio che di tempo, può notevolmente potenziare la robustezza e l’efficacia di tali modelli. Questo sottolinea l’importanza di adattare le metodologie di addestramento per meglio riflettere la natura in continua evoluzione dei malware. GGli attacchi avversari costituiscono una minaccia rilevante per l’integrità e la fiducia nei sistemi di rilevamento malware, sfruttando punti deboli nei modelli per eludere la rilevazione. I nostri risultati evidenziano che i modelli formati su dataset equilibrati possiedono una resilienza sorprendente a tali attacchi. Una novità della nostra ricerca è l’analisi del compromesso tra il tasso di falsi positivi (FPR) e la resistenza del modello. Dimostriamo che, allentando leggermente i criteri relativi al FPR, si mantengono non solo le performance dei nostri modelli ma si migliora anche in modo significativo la loro resistenza agli attacchi avversari.
On the impact of dataset construction to malware detection models performance and robustness
AMABILI, LAURA
2022/2023
Abstract
This thesis focuses on the deployment of Deep Learning (DL) models for malware de- tection. It addresses the challenges these models face, including their struggle to keep pace with the rapid evolution of malware and their vulnerabilities to adversarial attacks. A significant emphasis is placed on the impact of dataset bias. This research explores how the construction and composition of training datasets influence the performance and resilience of DL-based malware detection models. Against the conventional belief in the necessity of extensive adversarial training and large datasets, this thesis demonstrates that the construction of training datasets, unbiased both spatially and temporally, can significantly improve the robustness and effectiveness of these models, highlighting the importance of adapting training methodologies to better reflect the mutating nature of malware. Adversarial attacks represent a significant threat to the integrity and reliability of malware detection systems, exploiting weaknesses in models to evade detection. Our findings illustrate that models trained on unbiased datasets can withstand such attacks, showcasing unexpected robustness. Another innovative aspect of our research is the ex- amination of the trade-offs between false positive rates (FPR) and model resilience. In particular, this research demonstrates that by slightly relaxing the FPR criteria, we not only preserve the core detection capabilities of our models but also significantly enhance their resilience to adversarial attacks.File | Dimensione | Formato | |
---|---|---|---|
2024_04_Amabili_Tesi.pdf
Open Access dal 19/03/2025
Descrizione: Tesi
Dimensione
630.77 kB
Formato
Adobe PDF
|
630.77 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/219474