IoT Forensics made easy : extracting information from Amazon Echo's network traffic with Feature Sniffer

In the last few years, with the increasing use in everyday life of IoT devices, more attention has been posed to the security and privacy aspect. Ensuring and verifying the reliability of these devices is a key objective for IoT Forensics. Different papers have been published to demonstrate the possibility of extracting information from encrypted network traffic using Machine Learning techniques. However, there are two main problems when using learning algorithms on those kind of data: first of all, a real, reliable ground truth is rarely available secondly, it is not imme- diate to find out and extract the useful features needed to feed the final model. For these reasons, the researchers of Politecnico di Milano have developed a tool named Feature- Sniffer (FS) that helps to extract relevant features, obtained by grouping packets into time windows, while sniffing the internet traffic directly from the access point. The aim of this thesis is to test Feature Sniffer’s capability to extract information out of the encrypted traffic produced by an Amazon Echo device (Alexa) while the user interacts with it performing common home assistant activities. We will show that this is possible (even though it depends from the task and with some limitations) and we will also train a model able to identify, with almost zero error, when an interaction with Alexa is hap- pening. Having a tool like that makes building the training dataset way more easier than ever, opening the doors to a multitude of new research activities.

Negli ultimi anni, con la crescita esponenziale dell’utilizzo di dispositivi IoT nella vita di tutti i giorni, più attenzione è stata posta anche sull’aspetto della sicurezza e della privacy. Assicurare e verificare l’affidabilità di questi dispositivi è un requisito fondamentale per l’IoT Forensics. Sono stati pubblicati numerosi articoli che dimostrano la possibilità di estrarre informazioni utilizzando algoritmi di Machine Learning allenati sul traffico dati cifrato delle interazioni con questi dispositivi. Ci sono però due problematiche principali quando si pratica Machine Learning su questo tipo di dati: in primis, quasi mai abbiamo un vero ground truth (un dataset pulito, privo di errori e rumore) e in secondo luogo è veramente difficile e macchinoso individuare ed estrarre quali dati possano essere utili per allenare tali algoritmi. Per questo motivo i ricercatori del Politecnico hanno sviluppato uno strumento, chiamato Feature-Sniffer (FS), in grado di estrarre dei dati rilevanti provenienti dal traffico di rete prodotto dai dispositivi IoT durante il loro funzionamento, ottenuti raggruppando i pacchetti in finestre temporali ed estraendo dei valori statistici. Lo scopo di questa tesi è quello di verificare se, utilizzando FS, sia effettivamente possi- bile estrarre informazioni dal traffico dati generato da delle interazioni con un dispositivo Amazon Echo (Alexa). Vedremo come ciò non solo sia possibile (anche se con qualche limitazione, dipendente- mente dal tipo di informazione che si vuole estrarre), ma alleneremo anche anche un algo- ritmo in grado di identificare, con errore quasi nullo, quando sta avvenendo un’interazione con Alexa. Avere un tool come questo a disposizione permetterà di semplificare di molto il processo di creazione del training dataset d’ora in poi, spianando la strada per una moltitudine di nuove ricerche.