Towards effective CAN IDS validation: dataset limitations and requirements definition

In recent years, cars have become increasingly complex from an electronic standpoint, thanks to the addition of features such as lane-keeping assistance, cruise control, and anti-collision systems, which contribute to enhanced safety. Vehicles are no longer “simple” mechanical devices, but they are complex cyber-physical systems where electronic control units (ECUs) communicate via the Controller Area Network (CAN) bus to control many aspects of the vehicle, from the engine and transmission to safety systems like airbags and ABS (Anti-Lock Braking System), as well as the control of doors, windows, and ventilation. However, these new functionalities, particularly with the introduction of interfaces to the external world (e.g., Bluetooth, Wi-Fi, cellular networks, etc.), designed to improve the driving experience, expose vehicles to new dangers, specifically cybersecurity risks. Several studies have demonstrated the feasibility of attacks, even remotely, that include the possibility of braking the vehicle, stopping the engine, unlocking doors, manipulating the speedometer, and locating or tracking the vehicle's position. To address these issues, various solutions have been studied, including the use of Intrusion Detection Systems (IDSs) to detect the presence of intrusions in the CAN bus and alert the driver to secure the vehicle. Numerous IDSs have been proposed in the literature that rely on various types of information and utilize different techniques, including machine learning, which has been increasingly adopted in recent times. While much of the research in this field has focused on developing IDSs, it is equally true that there are no adequate datasets available to test them effectively. The aim of our thesis work is to contribute to establish the foundation for creating a dataset to effectively test IDSs by defining the requirements it should meet. In particular, we focused on application-based IDSs, which analyze the frames that pass through the CAN bus. In our work, we initially analyzed the existing literature, studying the types of IDSs and classifying them to better understand their validation requirements. Given the complexity and variety of existing IDSs, we concentrated on application-based systems. Subsequently, we examined the different types of attacks to understand which can occur and provided a classification system for these as well. We then analyzed existing datasets, both through literature review and our analyses, to list which ones are available, their characteristics, and the issues they present. Finally, we systematically listed the problems encountered in the currently available datasets, further exploring some of them with tests to demonstrate the consequences of certain issues. We ultimately defined the requirements that a dataset for testing application-based IDSs should meet.

Negli ultimi anni, le auto sono diventate sempre più complesse dal punto di vista elettronico, grazie all’aggiunta di funzionalità come il mantenimento della corsia, il cruise control o i sistemi anti-collisione, che contribuiscono al miglioramento della sicurezza. Le auto, quindi, non sono più “semplici” dispositivi meccanici, ma complessi sistemi cyber-fisici in cui centraline elettroniche, chiamate Electronic Control Unit (ECU), comunicano attraverso il Controller Area Network (CAN) bus per controllare molti aspetti del veicolo, dal motore e dalla trasmissione ai sistemi di sicurezza come airbag e ABS (Anti-Lock Braking System), fino al controllo di porte, finestrini e ventilazione. Tuttavia, queste nuove funzionalità, in particolare con l’aggiunta di interfacce verso il mondo esterno (e.g., Bluetooth, Wi-Fi, reti cellulari, ecc.), introdotte per migliorare l’esperienza di guida, espongono i veicoli a nuovi pericoli, ossia a rischi di sicurezza informatica. Diversi studi hanno dimostrato la fattibilità di attacchi, anche da remoto, che includono la possibilità di frenare il veicolo, fermare il motore, sbloccare le porte, manipolare il tachimetro e localizzare o tracciare la posizione del veicolo. Per affrontare questi problemi, sono state studiate diverse soluzioni, tra cui l’utilizzo di Intrusion Detection System (IDS) per rilevare la presenza di intrusioni nel CAN bus e avvisare il conducente per poter mettere in sicurezza il veicolo. In letteratura sono stati proposti numerosi IDS che si basano su diversi tipi di informazioni e utilizzano svariate tecniche, tra cui il machine learning, sempre più adottato negli ultimi tempi. Se è vero che molta parte della ricerca in questo ambito si è concentrata sullo sviluppo degli IDS, è altrettanto vero che non esistono dataset adeguati per testarli efficacemente. Lo scopo del nostro lavoro di tesi è contribuire a stabilire le basi per la creazione di un dataset per testare efficacemente gli IDS, definendo i requisiti che dovrebbe soddisfare. In particolare, ci siamo concentrati sugli IDS application-based, ovvero quelli che analizzano i frame che transitano sul CAN bus. Nel nostro lavoro, abbiamo inizialmente analizzato la letteratura esistente, studiando le tipologie di IDS e classificandole per comprendere meglio le loro esigenze di validazione. Data la complessità e la varietà degli IDS esistenti, ci siamo focalizzati sugli application-based. Successivamente, abbiamo esaminato le diverse tipologie di attacchi per comprendere quali possono verificarsi e anche di questi abbiamo fornito un sistema di classificazione. Abbiamo poi analizzato i dataset esistenti, sia attraverso lo studio della letteratura che tramite nostre analisi, al fine di elencare quali sono disponibili, le loro caratteristiche e le problematiche che presentano. Infine, abbiamo elencato in modo sistematico i problemi riscontrati nei dataset attualmente disponibili, approfondendo ulteriormente alcuni di essi con test per mostrare le conseguenze di determinate problematiche. Abbiamo infine definito i requisiti che un dataset per testare gli IDS application-based dovrebbe rispettare.