Investigating what types of claims can be fact checked with time series

With the birth of social networks, fake news is a phenomenon that has become ever more important. Anyone can say or share fake news, from a random passerby on the street to an important politician. Learning to spot a false claim is extremely important and automated approaches to recognize fake news are rising in popularity thanks to machine learning, which enable users to skip the tedious process of verifying a claim. Often the claim alone has not sufficient information to allow fact checking and has to be paired with other data, like a table. In this thesis we focus on the domain of fact checking with time series, which is still very restricted and not a lot of work has been done. Our purpose is to explore what constitutes a time series related claim and how to filter them from unrelated claims. We first make use of a Natural Language Processing engine to semantically analyze time series related claims and build a taxonomy of common types of political claims. We then propose two filtering solutions: a filter based on Part of Speech tags used as a baseline and a neural network classifier trained on samples collected from existing table fact checking datasets. Finally, we evaluate the performance of two state of the art tabular fact checking techniques in the domain of time series fact checking. This work paves the way for automated fact checking with time series.

A causa dei social network, le fake news sono un fenomeno che si è sparso a macchia d'olio. Chiunque può dire o condividere una fake news, da un passante che incontri per strada ad una importante figura politica. Imparare a riconoscere un' affermazione falsa è estremamente importante e approcci automatici per riconoscere fake news stanno diventando sempre più popolari grazie al machine learning, che permette agli utenti di evitare il noioso e lungo processo di verificare un' affermazione. Spesso, solamente la frase detta non ha abbastanza informazioni per essere verificata da sola e ha bisogno di essere accoppiata con ulteriori dati, ad esempio una tabella. In questa tesi ci concentreremo sul dominio del fact checking con serie temporali, che è ancora molto ristretto. Lo scopo di questa tesi è esplorare che cosa costituisce un' affermazione relativa a una serie temporale e come filtrarla da una normale asserzione. All'inizio usiamo un motore di elaborazione del linguaggio naturale per analizzare semanticamente affermazioni relative a serie temporali e costruiamo una tassonomia di tipi di asserzioni comuni nei discorsi politici. Proponiamo inoltre due approcci di filtraggio: uno basato sui Part of Speech tags, che usiamo come punto di partenza per misurare le prestazioni, e un classificatore basato su una rete neurale allenato con degli esempi tratti da dataset già esistenti riguardanti il fact checking con tabelle. Infine, valutiamo la prestazione di due tecniche di fact checking con tabelle quando gli chiediamo di verificare asserzioni relative a serie temporali. Il nostro lavoro spiana la strada per il dominio del fact checking con serie temporali.