Audio splicing detection and localization based on recording device cues

In recent years, we have witnessed an increasing spread of technology. Artificial intelligence and machine learning are now part of our daily lives. The availability of these sophisticated techniques, even on the consumer market, has made it possible for anyone to create multime- dia content at a professional level. This has also created a new kind of problem to deal with: it has become very easy to create very realistic fake content that can be used to convey targeted messages by exploiting the notoriety of certain people. For this reason, the possibility of ver- ifying the reliability of a multimedia object is becoming of paramount importance, especially if these files are used as evidence in trials. The problem we have addressed in this thesis goes in this direction. Our goal is to determine whether an audio track under analysis has been manip- ulated through splicing. Moreover, if a recording is detected as spliced, we identify where it has been modified. The method we propose is based on a Convolutional Neural Network (CNN) to extract certain features from the audio recording. After extracting the features, we determine through a clustering algorithm if there has been a manipulation. Finally, we identify the point where the modification has been introduced with a distance-based technique. The results achieved are very satisfactory as we are able to reach 98% accuracy for the identification phase and a very small error for the localisation task on a dataset we built on purpose to study this problem.

Negli ultimi anni, abbiamo assistito a una crescente diffusione della tec- nologia. L’intelligenza artificiale e l’apprendimento automatico fanno ormai parte della nostra vita quotidiana. La disponibilità di queste tec- niche sofisticate, anche sul mercato consumer, ha reso possibile a chi- unque creare contenuti multimediali a livello professionale. Questo ha anche creato un nuovo tipo di problema da affrontare: è diventato molto facile creare contenuti falsi molto realistici che possono essere utilizzati per trasmettere messaggi mirati sfruttando la notorietà di alcune persone. Per questo motivo, la possibilità di verificare l’affidabilità di un oggetto multimediale sta diventando di fondamentale importanza, soprattutto se questi file vengono utilizzati come prove nei processi. Il problema che ab- biamo affrontato in questa tesi va in questa direzione. Il nostro obiettivo è quello di determinare se una traccia audio in analisi è stata manipolata attraverso lo splicing. Inoltre, se una registrazione viene rilevata come manipolata, identifichiamo dove è stata modificata. Il metodo che pro- poniamo si basa su una rete neurale convoluzionale (CNN) per estrarre alcune caratteristiche dalla registrazione audio. Dopo aver estratto le caratteristiche, determiniamo attraverso un algoritmo di clustering se c’è stata una manipolazione. Infine, identifichiamo il punto in cui la modi- fica è stata introdotta con una tecnica basata sulla distanza. I risultati ottenuti sono molto soddisfacenti in quanto siamo in grado di raggiun- gere il 98% di accuratezza per la fase di identificazione e un errore molto piccolo per la localizzazione su un set di dati che abbiamo costruito ap- positamente per studiare questo problema.