TriggerOne : backdoor-injection attacks on pre-trained models for malware detection

With the advent of deep neural networks and their application to the field of malware detection, a new powerful tool able to analyze binary files and detect malicious behavior has been introduced. However, neural networks have been proven to be susceptible to different attacks, such as evasion attacks, training set poisoning attacks, and backdoor injection attacks. This thesis focuses on the backdoor injection attacks on pre-trained malware detection models. This attack strategy affects publicly available pre-trained models, which are modified to embed a backdoor, namely a hidden functionality that allows an arbitrary network output whenever the submitted input contains a specific pattern, the trigger. Similar attacks have been performed to backdoor famous models in the field of computer vision; however, to the best of our knowledge, no pre-trained malware detection neural network has ever been attacked with similar techniques. The domain shift is not trivial, as neural networks designed for computer vision tasks are radically different from neural networks for malware detection. We test and adapt to the new domain three attack strategies, specifically targeting MalConv, a convolutional neural network for malware detection. We propose the model updating attack, in which we re-train the pre-trained model with trigger-poisoned data, the weights perturbation attack, in which we analyze the model and carefully modify certain neurons to inject the backdoor, and the subnet replacement attack, in which we train a small neural network which is then injected in the original pre-trained model and activates whenever the input contains the trigger. We also test four possible defense strategies that a victim might adopt to detect and even remove an injected backdoor. Our model updating attack and subnet replacement attack achieved a backdoor success rate of 97%, while the weights perturbation scored 91% on a poisoned test set. Our attacks outperformed existing evasion attacks on MalConv and obtained comparable results to similar attacks on computer vision models.

Con l'avvento delle reti neurali, e la loro applicazione nel campo della malware detection è stato introdotto un nuovo e potente strumento, in grado di analizzare file binari e identificare attività malevole. Le reti neurali, nonostante le loro infinite potenzialità, sono suscettibili a diversi attacchi, come attacchi evasivi, attacchi di contaminazione del training set e attacchi di inserimento di backdoor. In questa tesi, ci concentriamo sugli attacchi di inserimento backdoor a modelli pre-allenati per la malware detection; questo attacco colpisce modelli pre-allenati pubblici, che vengono modificati per ospitare una backdoor, ovvero una funzionalità nascosta che fa produrre output arbitrario al modello nel caso l'input contenga una sequenza specifica: il trigger. Attacchi simili sono stati eseguiti su modelli nel campo della computer vision, ma riteniamo di essere i primi a proporre attacchi di inserimento backdoor a modelli pre-allenati per malware detection. Questo adattamento di dominio non è banale, considerato che le reti neurali studiate per la computer vision sono radicalmente diverse da quelle per la malware detection. In particolare, adattiamo a questo nuovo dominio tre attacchi, concentrandoci su MalConv, una rete neurale convoluzionale per la malware detection. Proponiamo i seguenti attacchi: il model updating, nel quale ri-alleniamo il modello pre-allenato con nuovi dati contaminati con il trigger, il weights perturbation nel quale analizziamo il modello e modifichiamo alcuni neuroni per inserire la backdoor e, infine, il subnet replacement, nel quale alleniamo una piccola rete che viene inserita nel modello originale, la quale si attiverà ogni qualvolta l'input contenga il trigger. Proponiamo anche quattro possibili difese che una vittima potrebbe utilizzare per individuare e fermare una backdoor. I nostri attacchi di model updating e subnet replacement ottengono una sensibilità del 97% sui test set con trigger, mentre il nostro attacco di weights perturbation arriva al 91%. I nostri attacchi mostrano risultati migliori degli attacchi evasivi già esistenti su MalConv e risultati comparabili ad attacchi simili su modelli per il computer vision.