The aim of this thesis is to deploy efficient algorithms to automatically understand online user-generated discussions. In recent years, governments worldwide supported by the increasing media pressure and recent serious crime events, are demanding that social media companies, online companies, media platforms and related private stakeholders take more responsibility for what appears in their virtual spaces and are asking them to invest more in the early detection of users emotions (especially negative) and fast removal of hostile and hateful contents. Consequently, this pressure is resulting in higher companies' research investments on efficient algorithms for a newly born Natural Language Processing task but still with very limited research literature available, namely Hate Speech Detection. On the other hand, with the growing interest in ethics and sustainability issues, efficiency of Hate Speech Detection algorithms is measured lately also in terms of the biases affecting the algorithm. Unbiased algorithms are models where every group (underrepresented or protected) is fairly treated by automatic systems. Finally, an increased awareness on gaining social behaviors insights behind hateful users comments is demanded by public authorities in order to be proactive and anticipate violent online events. In this multifaceted context, this thesis advocate the use of Deep Learning methods as an efficient approach to reach faster, accurate, unbiased and aware algorithms for Hate Speech Detection by working on three different specific domains of application. Firstly, this work will introduce and implement new hybrid representations of user-generated comments for text-classification leveraging strengths of classical machine learning and deep learning techniques and outperforming previous attempts in literature. Secondly, this research will design a hate speech (specifically focused on misogyny) detection deep learning model that demonstrated to obtain the best classification performance in the state-of-the-art. In the same study, experimental results also will confirm the ability of the bias mitigation treatment implemented to reduce the unintended bias in online micro-blogging platforms, such as Twitter. Finally, we propose dynamic representations of words as a suitable deep learning tool to study the evolution of users roles and their sentiments across the plot of a narrative text or an online discourse; that could be used for the identification of victims/aggressors in Hate Speech Detection models. Thesis results are promising, and the empirical research outcomes demonstrated to support the working ideas behind this PhD work. From a methodological point of view, the Hate Speech Detection task will be addressed and studied by leveraging the wide literature available for the closely related and widely studied Sentiment Analysis task. Both tasks will be object of this thesis but with different approaches and scopes: (i) the Sentiment Analysis task and its wide literature will be investigated uniquely in order to retrieve state-of-the-art approaches and methodologies for text classification of sentences sentiment-wise; (ii) by leveraging the wide literature and the large amount of benchmark data sets available for Sentiment Analysis, new methodologies and techniques will be specifically designed exclusively for the Hate Speech Detection task. Except for the first paper in the collection, where a new approach is tested on a Sentiment Analysis task due to a lack of Hate Speech Datasets back when the paper was written, any further analysis on Sentiment Analysis state-of-the-art methods is out of scope for this thesis. As future research developments, as second step in the direction of further leveraging the interplay between these tasks, we envision the use of Transfer Learning between Sentiment Analysis and Hate Speech Detection in order to improve the latter's performances. On the other hand, this work motivates and envisions further investigations of the use of temporal embeddings for the identification of victims and aggressors in hate speech dialogues, responding to the need of providing further steps in the direction of designing tools able to anticipate and prevent extreme incidents in online and offline spaces.

Obiettivo di questo lavoro di tesi è l'analisi di algoritmi di Deep Learning per l'individuazione di commenti di odio online e Sentiment Analysis.

Hybrid Deep Learning for Sentiment Analysis and Hate Speech Detection

VOLPETTI, CLAUDIA

Abstract

The aim of this thesis is to deploy efficient algorithms to automatically understand online user-generated discussions. In recent years, governments worldwide supported by the increasing media pressure and recent serious crime events, are demanding that social media companies, online companies, media platforms and related private stakeholders take more responsibility for what appears in their virtual spaces and are asking them to invest more in the early detection of users emotions (especially negative) and fast removal of hostile and hateful contents. Consequently, this pressure is resulting in higher companies' research investments on efficient algorithms for a newly born Natural Language Processing task but still with very limited research literature available, namely Hate Speech Detection. On the other hand, with the growing interest in ethics and sustainability issues, efficiency of Hate Speech Detection algorithms is measured lately also in terms of the biases affecting the algorithm. Unbiased algorithms are models where every group (underrepresented or protected) is fairly treated by automatic systems. Finally, an increased awareness on gaining social behaviors insights behind hateful users comments is demanded by public authorities in order to be proactive and anticipate violent online events. In this multifaceted context, this thesis advocate the use of Deep Learning methods as an efficient approach to reach faster, accurate, unbiased and aware algorithms for Hate Speech Detection by working on three different specific domains of application. Firstly, this work will introduce and implement new hybrid representations of user-generated comments for text-classification leveraging strengths of classical machine learning and deep learning techniques and outperforming previous attempts in literature. Secondly, this research will design a hate speech (specifically focused on misogyny) detection deep learning model that demonstrated to obtain the best classification performance in the state-of-the-art. In the same study, experimental results also will confirm the ability of the bias mitigation treatment implemented to reduce the unintended bias in online micro-blogging platforms, such as Twitter. Finally, we propose dynamic representations of words as a suitable deep learning tool to study the evolution of users roles and their sentiments across the plot of a narrative text or an online discourse; that could be used for the identification of victims/aggressors in Hate Speech Detection models. Thesis results are promising, and the empirical research outcomes demonstrated to support the working ideas behind this PhD work. From a methodological point of view, the Hate Speech Detection task will be addressed and studied by leveraging the wide literature available for the closely related and widely studied Sentiment Analysis task. Both tasks will be object of this thesis but with different approaches and scopes: (i) the Sentiment Analysis task and its wide literature will be investigated uniquely in order to retrieve state-of-the-art approaches and methodologies for text classification of sentences sentiment-wise; (ii) by leveraging the wide literature and the large amount of benchmark data sets available for Sentiment Analysis, new methodologies and techniques will be specifically designed exclusively for the Hate Speech Detection task. Except for the first paper in the collection, where a new approach is tested on a Sentiment Analysis task due to a lack of Hate Speech Datasets back when the paper was written, any further analysis on Sentiment Analysis state-of-the-art methods is out of scope for this thesis. As future research developments, as second step in the direction of further leveraging the interplay between these tasks, we envision the use of Transfer Learning between Sentiment Analysis and Hate Speech Detection in order to improve the latter's performances. On the other hand, this work motivates and envisions further investigations of the use of temporal embeddings for the identification of victims and aggressors in hate speech dialogues, responding to the need of providing further steps in the direction of designing tools able to anticipate and prevent extreme incidents in online and offline spaces.
TRUCCO, PAOLO
ROSSI, CRISTINA
8-gen-2020
Metodi di Deep Learning per l'identificazione di commenti di odio online e Sentiment Analysis.
Obiettivo di questo lavoro di tesi è l'analisi di algoritmi di Deep Learning per l'individuazione di commenti di odio online e Sentiment Analysis.
Tesi di dottorato
File allegati
File Dimensione Formato  
PHDThesis_VOLPETTI.pdf

accessibile in internet per tutti

Descrizione: Testo della tesi
Dimensione 3.84 MB
Formato Adobe PDF
3.84 MB Adobe PDF Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/150882