ANNtivirus : dissecting antivirus programs through neural network explainability

The amount of daily detected malware is increasing every year. Antivirus programs are the main defence mechanism against malware attacks, as they aim to prevent, detect and remove malicious files. However, they can be seen as a black box, in which it is unknown how detection happens. In this work, we aim to understand how various antivirus engines classify files between legitimate and malicious. In fact, by knowing the underlying structure of an engine, we can make its predictions wrong. Moreover, we exploit this knowledge to craft false positive adversarial samples: innocent files that are recognised as malware. Such files are needed to understand which bytes can trigger an antivirus for maliciousness. We first built a neural network that could mimic the target antivirus. Its task is to predict the outcome of the antivirus based on the whole input file, while previous work considered only some of its characteristics. Our model was able to achieve 97% of accuracy regarding the antivirus predictions. Afterwards, we applied explainability techniques to find which areas of a file are more scanned during detection. We implemented a Generative Adversarial Network to discover which malicious sequences of bytes can trigger the clone model. Then, we injected them inside legitimate files and verified if they were able to deceive the real-world antivirus as well, obtaining 75% of successfully crafted samples with only 7% of difference with respect to the original file. Finally, we improved the imitation accuracy of our clone model by training it with the newly generated files.

La quantità di malware rilevato quotidianamente aumenta ogni anno. I programmi antivirus sono il principale meccanismo di difesa contro gli attacchi di malware, in quanto mirano a prevenire, rilevare e rimuovere i file dannosi. Tuttavia, questi software possono essere visti come una scatola nera, in cui non si sa come avviene il rilevamento di malware. In questo lavoro, ci proponiamo di capire come i vari antivirus distinguono i file tra legittimi e maligni. Comprendendo la loro struttura sottostante, infatti, possiamo rendere imprecisi i loro risultati. Inoltre, intendiamo sfruttare queste informazioni per creare campioni avversari di falsi positivi: file innocenti che vengono riconosciuti come malware. Questi file sono necessari per capire quali byte sono considerati dannosi dall'antivirus. Per prima cosa, abbiamo costruito una rete neurale in grado di imitare l'antivirus di riferimento. Il suo compito è quello di predire l'esito dell'antivirus analizzando l'intero file di input, mentre i lavori precedenti prendevano in considerazione solo alcune delle sue caratteristiche. Il nostro modello è stato in grado di raggiungere il 97% di precisione rispetto al comportamento dell'antivirus. In seguito, abbiamo applicato tecniche di explainability per individuare quali aree di un file vengono maggiormente esaminate durante la classificazione. Abbiamo poi implementato una rete avversaria generativa per scoprire quali sequenze di byte dannosi possono innescare i sistemi di rilevamento di malware del modello clone. Successivamente, le abbiamo inserite all'interno di file legittimi per verificare la loro capacità di ingannare anche l'antivirus di riferimento, ottenendo il 75% dei campioni modificati con successo con solo il 7% di differenza rispetto al file originale. Infine, abbiamo migliorato la precisione di imitazione del nostro modello clone addestrandolo con i nuovi file generati.