Generating articles for automated fact-checking

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

Fake news has significantly influenced our lives in recent years, causing several problems in different layers of society. That is why, now more than ever, we must fact-check the news. Fact-checking can help to prevent misinformation; nevertheless, it is not as simple as it appears. Even if the source is trusted and its information is entirely accurate, various problems emerge along the way, such as determining the factual information in the claim, retrieving relevant documents from the source, and verifying claims based on the gathered data. We can partially automate the work and assist fact-checkers focus on more crucial areas to make the most of their time, thanks to recent advances in machine learning. With this thesis, we are attempting to train and assess a text-generation model for generating fact-checking analysis from source documents. Our focus is on Transformer-based language models like T5 and BART. To create the dataset, we collected articles from "Politifact.org" and used the prompting technique to train the same model on different subtasks. Furthermore, we created a database using Elastic Search and the Dense Passage Retrieval model to make the index. Finally, we compared the results of our model with the state-of-the-art GPT-3, which has the best performance on several NLP subtasks.

Le fake news hanno influenzato in modo significativo la nostra vita negli ultimi anni, causando diversi problemi su differenti strati sociali. Ecco perché, ora più che mai, dobbiamo verificare le notizie. Il controllo dei fatti può aiutare a prevenire la disinformazione; tuttavia, non è così semplice come sembra. Anche se la fonte è attendibile e le sue informazioni sono del tutto accurate, possono emergere vari problemi, come la determinazione delle informazioni fattuali nella notizia, il recupero dei documenti pertinenti dalla fonte e la verifica delle affermazioni basate sui dati raccolti. Possiamo in parte automatizzare il lavoro e assistere i fact-checker a concentrarsi su aree più cruciali per sfruttare al meglio il loro tempo, grazie ai recenti progressi nel machine learning. Con questa tesi, cerchiamo di addestrare e valutare un modello di generazione di testo per generare analisi di verifica dei fatti dai documenti di origine. Il nostro focus è sui modelli di linguaggio basati su trasformatori autoregressivi come T5 e BART. Per creare il set di dati, abbiamo raccolto articoli da "Politifact.org" e utilizzato la tecnica del prompting per addestrare lo stesso modello su diversi compiti. Inoltre, abbiamo creato un database usando Elastic Search e il modello Dense passage retrieval per creare l'indice. Infine, confrontiamo i risultati del nostro modello con la tecnologia più avanzata GPT-3, che ha le migliori prestazioni su diversi sottotitoli NLP.

Generating articles for automated fact-checking

HASHEMIAN, SEYED AMIN

2021/2022

Abstract

Fake news has significantly influenced our lives in recent years, causing several problems in different layers of society. That is why, now more than ever, we must fact-check the news. Fact-checking can help to prevent misinformation; nevertheless, it is not as simple as it appears. Even if the source is trusted and its information is entirely accurate, various problems emerge along the way, such as determining the factual information in the claim, retrieving relevant documents from the source, and verifying claims based on the gathered data. We can partially automate the work and assist fact-checkers focus on more crucial areas to make the most of their time, thanks to recent advances in machine learning. With this thesis, we are attempting to train and assess a text-generation model for generating fact-checking analysis from source documents. Our focus is on Transformer-based language models like T5 and BART. To create the dataset, we collected articles from "Politifact.org" and used the prompting technique to train the same model on different subtasks. Furthermore, we created a database using Elastic Search and the Dense Passage Retrieval model to make the index. Finally, we compared the results of our model with the state-of-the-art GPT-3, which has the best performance on several NLP subtasks.

Scheda breve

Scheda completa

	Relatore
	
				CARMAN, MARK JAMES
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				20-dic-2022
			
	Anno accademico
	
				2021/2022
			
	Abstract in italiano
	
				Le fake news hanno influenzato in modo significativo la nostra vita negli ultimi anni, causando diversi problemi su differenti strati sociali. Ecco perché, ora più che mai, dobbiamo verificare le notizie. Il controllo dei fatti può aiutare a prevenire la disinformazione; tuttavia, non è così semplice come sembra. Anche se la fonte è attendibile e le sue informazioni sono del tutto accurate, possono emergere vari problemi, come la determinazione delle informazioni fattuali nella notizia, il recupero dei documenti pertinenti dalla fonte e la verifica delle affermazioni basate sui dati raccolti.
Possiamo in parte automatizzare il lavoro e assistere i fact-checker a concentrarsi su aree più cruciali per sfruttare al meglio il loro tempo, grazie ai recenti progressi nel machine learning. Con questa tesi, cerchiamo di addestrare e valutare un modello di generazione di testo per generare analisi di verifica dei fatti dai documenti di origine. Il nostro focus è sui modelli di linguaggio basati su trasformatori autoregressivi come T5 e BART. Per creare il set di dati, abbiamo raccolto articoli da "Politifact.org" e utilizzato la tecnica del prompting per addestrare lo stesso modello su diversi compiti.
Inoltre, abbiamo creato un database usando Elastic Search e il modello Dense passage retrieval per creare l'indice. Infine, confrontiamo i risultati del nostro modello con la tecnologia più avanzata GPT-3, che ha le migliori prestazioni su diversi sottotitoli NLP.
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
Executive_Summary___Scuola_di_Ingegneria_Industriale_e_dell_Informazione___Politecnico_di_Milano.pdf accessibile in internet solo dagli utenti autorizzati Dimensione 627.16 kB Formato Adobe PDF Visualizza/Apri	627.16 kB	Adobe PDF	Visualizza/Apri
Thesis___Seyed_Amin_Hashemian___Politecnico_di_Milano.pdf accessibile in internet per tutti Dimensione 8.88 MB Formato Adobe PDF Visualizza/Apri	8.88 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/196925