Semi-automated identification and handling of input parsing routines for efficient fuzzing and symbolic execution

Securing software through analyses aimed to identify and fix vulnerabilities is an established practice. We, also, increasingly need to automate the process of testing and analysing binaries, in order to build secure large-scale software. Among the many binary analysis techniques, fuzzing and symbolic execution turn out to be very effective for this purpose. The main idea behind fuzzing is to execute a program with plenty of random inputs, observing and analyzing unexpected behaviors. The goal of symbolic execution, on the other hand, is to explore all the code paths of the program by computing the values of the input that allow the execution to reach each path. Symbolic variables are mathematical expressions that are constrained during the execution of the program. By solving these constraints, it is possible to compute the concrete values that cause the program to execute specific parts of code, that are potentially vulnerable. Parsing routines are one of the main obstacles to both the techniques, because they discard most of the inputs generated by the fuzzer and produce a lot of useless constraints that the symbolic engine has to solve. In this work, we develop a semi-automated method to identify parsing routines and we present two approaches aimed to allow the fuzzer to generate only inputs that are accepted by the target program, so that it can easily test the code that is most likely vulnerable, and to perform symbolic execution without caring about the constraints due to these parsing functions. The results of our tests show a significant improvement in the time spent to find bugs through fuzzing and to compute the desired input values through symbolic execution.

E' ormai prassi consolidata rendere più sicuro il software tramite analisi atte a identificarne le vulnerabilità e a correggerle. E' inoltre sempre più sentita l'esigenza di automatizzare il processo di test e analisi, al fine di realizzare software sicuro su larga scala. Fra le svariate tecniche di analisi del software, il fuzzing e la symbolic execution risultano particolarmente efficaci a tal proposito. L'idea di base del fuzzing è quella di eseguire un programma con una quantità enorme di input casuali, osservando e analizzando eventuali comportamenti anomali del programma. Invece, lo scopo della symbolic execution è esplorare tutti i possibili percorsi di un programma calcolando i valori che l'input deve assumere per permettere l'esecuzione di ciascun percorso. Le variabili simboliche sono espressioni matematiche che vengono vincolate durante l'esecuzione del programma. Risolvendo questi vincoli è possibile risalire ai valori concreti che permettono di eseguire specifiche parti di codice, potenzialmente vulnerabili. Uno dei principali ostacoli per entrambe le tecniche è rappresentato dalle funzioni di parsing, le quali respingono la maggior parte degli input generati dal fuzzer e producono una grossa quantità di vincoli inutili per il motore simbolico. In questo lavoro sviluppiamo un metodo semi-automatizzato per identificare le funzioni di parsing all'interno di un programma e presentiamo due approcci finalizzati a rendere il fuzzer capace di generare solo input che vengono accettati dal programma, in modo da raggiungere facilmente la parte di codice più probabilmente vulnerabile, e a eseguire la symbolic execution ignorando i vincoli dovuti a queste funzioni. I risultati dei nostri test dimostrano miglioramenti notevoli in termini di velocità nel trovare bug tramite fuzzing e nel calcolare i valori in input desiderati tramite symbolic execution.