ByteMatcher : a tool for semantic equivalence of bytecode through symbolic execution

Automated analysis of computer programs is a powerful tool in the arsenal of software developers. In the last decades, researchers presented numerous techniques to extract useful information from software. Some examples are tools to generate high-coverage tests, analyze execution flows to find unreachable code or detect vulnerabilities from known patterns. Nevertheless, being able to check whether two pieces of code are semantically equivalent is still an open challenge. This thesis presents ByteMatcher, a tool that employs the dynamic analysis technique known as symbolic execution to decide on the semantic equivalence of two bytecodes (fragments of compiled programs). This problem is generally undecidable, however it becomes solvable when faced under specific assumptions. ByteMatcher implements state-of-the-art approaches to face the four main challenges of symbolic execution: symbolic memory accesses, environment interactions, state space explosion, and symbolic constraints resolution. The most innovative contribution of this work is a technique that allows developers to compare the results of the execution of two bytecodes to know whether they are semantically equivalent. Developers have complete control over the sensitivity of ByteMatcher and can choose among four levels of equivalence to affect how the algorithm performs the comparison. On top of that, ByteMatcher can detect (possible) buffer overflow vulnerabilities by analyzing patterns in memory accesses during the symbolic execution of a bytecode. ByteMatcher was tested on bytecodes sampled from real-world programs. In particular, to validate the ability to detect semantic equivalence, ByteMatcher performed more than 160 000 comparisons extracted from 431 GitHub repositories. ByteMatcher can distinguish between equivalent and non-equivalent bytecodes with 81.16% of accuracy and 78.71% of F1-score. To test the functionality of buffer overflow detection, ByteMatcher executed more than 20 000 bytecodes sampled from the 116 challenge binaries of the DARPA’s CGC qualifying event. This experiment showed that ByteMatcher can detect 5 different types of vulnerabilities (according to the CWE standard), providing predictions with 66.38% of accuracy and 65.49% of F1-score.

L’analisi automatizzata di programmi informatici è un potente strumento nell’arsenale degli sviluppatori software. Negli ultimi decenni, i ricercatori hanno presentato numerose tecniche per estrarre informazioni utili dai software. Alcuni esempi sono strumenti per generare test ad alta copertura, analizzare flussi di esecuzione per trovare codice irraggiungibile o rilevare vulnerabilità a partire da pattern noti. Tuttavia, essere in grado di verificare se due codici sono semanticamente equivalenti è ancora una sfida aperta. Questa tesi presenta ByteMatcher, uno strumento che impiega la tecnica di analisi dinamica nota come esecuzione simbolica per decidere l’equivalenza di due bytecode (frammenti di programmi compilati). Questo problema è generalmente indecidibile, tuttavia diventa risolvibile se affrontato sotto ipotesi specifiche. Questo strumento implementa approcci all’avanguardia per affrontare le quattro sfide principali dell’esecuzione simbolica: accessi simbolici alla memoria, interazioni con l’ambiente, esplosione dello spazio degli stati e risoluzione dei vincoli simbolici. Il contributo più innovativo di questa tesi è una tecnica che consente agli sviluppatori di confrontare i risultati dell’esecuzione di due bytecode per sapere se sono equivalenti. Gli sviluppatori hanno il completo controllo sulla sensibilità di ByteMatcher e possono scegliere tra quattro livelli di equivalenza per influenzare il modo in cui l’algoritmo esegue il confronto. Inoltre, ByteMatcher può rilevare (possibili) vulnerabilità di buffer overflow analizzando i pattern negli accessi in memoria durante l’esecuzione simbolica di un bytecode. ByteMatcher è stato testato su bytecode campionati da programmi reali. In particolare, per convalidare la capacità di rilevare l’equivalenza semantica, ByteMatcher ha eseguito più di 160 000 confronti generati da 431 repository GitHub. ByteMatcher può distinguere tra bytecode equivalenti e non equivalenti con 81.16% di accuratezza e 78.71% di F1-score. Per testare la funzionalità di rilevamento dei buffer overflow, ByteMatcher ha eseguito più di 20 000 bytecode presi dai binari della fase di qualifica del CGC organizzato dalla DARPA. Questo esperimento ha dimostrato che ByteMatcher può rilevare 5 diversi tipi di vulnerabilità (secondo lo standard CWE), fornendo previsioni con 66.38% di accuratezza e 65.49% di F1-score.