Adversarial malware detection: an out of distribution detection approach

Adversarial attacks are a major threat to the security of machine learning models, es- pecially to the security of malware detection models. Recently, neural networks have become increasingly popular for malware detection, but they often fail to generalize to out-of-distribution (OOD) data. In this thesis, we address this problem by adapting three different state of the art solutions, used to detect OOD data in image classification, to the field of malware detection. The state of the art solutions that we found more promising are those proposed by Dziedzic, Olber, and Sun. We first implemented these solutions for the image classification domain and tested them on the MNIST dataset, a dataset consisting of images of handwritten digits commonly used for image classification tasks. We then adapted them for malware detection and tested them on a dataset of goodware samples, malware samples, and adversarial samples. To generate the adversarial samples, we used Malware Makeover and Kreuk et al.’s attack, two different state of the art adversarial attacks. For the adversarial samples generated with Malware Makeover, we obtained a detection rate of 100% for Dziedzic’s solution, 96.92% for Olber’s solution, and 95.91% for Sun’s solution. For the adversarial samples generated with the Kreuk et al. attack, we obtained a detection rate of 0% for Dziedzic’s solution, 99.56% for Olber’s solution, and 99.57% for Sun’s solution. The results for the solution proposed by Dziedzic were to be expected, since the attack was specifically adapted to bypass that solution. We also tested the two attacks we used to understand if one of the attacks was more effective than the other. We compared the average number of bytes added to the original file and the difference between the max pooling layer representations of the adversarial samples and the original samples. While Malware Makeover added more bytes to the original file on average, Kreuk et al.’s attack modified the max pooling layer representations of the adver- sarial samples more. We can conclude that neither attack is more effective than the other.

Gli attacchi avversariali sono una minaccia importante per la sicurezza dei modelli di machine learning, in particolare per la sicurezza di modelli di malware detection. Recen- temente, le reti neurali sono diventate sempre più popolari per la rilevazione di malware, ma spesso non riescono a generalizzare su dati out-of-distribution (OOD). In questa tesi, affronteremo questo problema adattando tre diverse soluzioni prese dallo stato dell’arte, le quali vengono utilizzate per rilevare i dati OOD in compiti di image classification, e le adatteremo per malware detection. Le soluzioni prese dallo stato dell’arte che ci sono sembrate più promettenti sono quelle proposte da Dziedzic, Olber e Sun. Come prima cosa abbiamo implementato queste soluzioni per il dominio della image classification e le abbiamo testate sul dataset MNIST, un dataset composto da immagini di cifre scritte a mano comunemente utilizzato per compiti di image classification. Per generare i malware avversariali, abbiamo utilizzato Malware Makeover e un attacco proposto da Kreuk et al., due differenti attacchi avversariali presi dallo stato dell’arte. Per quanto riguarda i malware avversariali generati con Malware Makeover, abbiamo ottenuto un detection rate del 100% per la soluzione proposta da Dziedzic, del 96.92% per la soluzione proposta da Olber e del 95.91% per la soluzione proposta da Sun. Per i malware avversariali generati con l’attacco proposto da Kreuk et al., abbiamo ottenuto un detection rate del 0% per la soluzione proposta da Dziedzic, del 99.56% per la soluzione proposta da Olber e del 99.57% per la soluzione proposta da Sun. I risultati per la soluzione proposta da Dziedzic erano prevedibili, poiché l’attacco è stato specificamente adattato per bypassare questo algoritmo. Abbiamo anche testato i due attacchi per capire se uno dei due fosse più ef- ficace dell’altro. Abbiamo confrontato il numero medio di byte aggiunti al file originale e la differenza tra le rappresentazioni del max pooling layer dei malware avversariali e dei malware originali. Mentre Malware Makeover aggiungeva più byte al file originale in media, l’attacco proposto da Kreuk et al. modificava di più le rappresentazioni del max pooling layer dei malware avversariali. Possiamo concludere che nessuno dei due attacchi è più efficace dell’altro.