Stance detection and analysis of human-bot interaction : a deep few-shot learning approach

In 2019, the World Health Organization has classified vaccine-hesitancy as one of the 10 greatest threats to human health, underlining how about 1.5 million lives each year could be saved by preventing this phenomenon. It is therefore essential to analyze the behavior and ideas of the population in this regard, through Machine Learning and Natural Language Processing methods. The work of this thesis focuses on detecting the user's opinion in vaccine-related tweets (Stance Detection) using a neural architecture, that is a very complex mathematical model that simulates the behaviour of the human brain, based on BERT language model, a text embedding method proposed by Google in 2018. I have noticed how the informal context, the frequent grammatical errors and the non-existent words that are created every day on Twitter mean that simple models are not particularly suited to this scenario. My solution has two main strengths compared to the studies already carried out in this direction: the first is that despite being very complex, the neural network manages to obtain good results even with only about a hundred manually classified data (few-shot learning); the second is that it takes advantage of pre-trained models on large amounts of data, a technique known as Transfer Learning, despite the latter belonging to a completely different domain and therefore having a distribution of words different from that of my dataset. The prediction made is then used to produce an interaction against the opinion of the author of the tweet, through a simple stance-based bot, with the aim to then study the immediate reaction of the users, categorize them based on that and see the effects that this intervention produces in the subsequent period. I also tried to discover different sub-topics of vaccine-related conversations, through a LDA model, in order to identify the trends of discussions in function of time and figure out different types of user's profile interested in the vaccine topic.

Nel 2019 l’Organizzazione Mondiale della Sanità ha classificato il rifiuto alla vaccinazione come una delle 10 più grandi minacce alla salute dell’umanità, sottolineando come circa 1.5 milioni di vite ogni anno potrebbero essere salvate prevenendo questo fenomeno. Risulta quindi indispensabile analizzare I comportamenti e le idee della popolazione a riguardo, attraverso metodi di apprendimento automatico e Natural Language Processing. Il lavoro di questa tesi si concentra su un’architettura neurale per rilevare l’opinione dell’utente nei tweets in cui si parla di vaccini (Stance Detection), ovvero un modello matematico molto complesso che simula il funzionamento del cervello umano, basata su BERT, un metodo di embedding di testi proposto da Google nel 2018. Ho notato infatti come il contesto informale, I frequenti errori grammaticali e le parole inesistenti che vengono coniate ogni giorno su Twitter fanno si che modelli semplici non siano adatti a questo scenario. La mia soluzione ha due principali punti di forza rispetto agli studi gia portati avanti in questa direzione: il primo è che pur essendo molto complessa, la rete neurale riesce ad ottenere buoni risultati anche con solamente un centinaio circa di dati classificati manualmente (few-shot learning); il secondo è che trae vantaggio da modelli pre-allenati su grandi quantità di dati, tecnica nota come Transfer Learning, nonostante questi ultimi appartengano a un dominio totalmente differente e quindi presentano una distribuzione delle parole diversa da quella del mio dataset. La predizione fatta viene poi utilizzata per produrre un’interazione contraria all’opinione dell’autore del tweet, attraverso un semplice stance-based bot, per poi studiare sia la reazione immediata degli utenti, sia gli effetti che questo intervento produce nel periodo successivo. Ho poi anche cercato di scoprire diversi sottoargomenti, attraverso un modello di LDA, delle conversazioni relative ai vaccini, al fine di identificare le tendenze delle discussioni in funzione del tempo e categorizzare I diversi tipi di utenti interessati all'argomento del vaccino.