Twitter sentiment analysis. A comparison of a large subset of the techniques and services available to perform sentiment analysis applied to data from Twitter

Many different models and services to perform Sentiment Analysis are available. It is often difficult to choose the right one for the use case of interest. This thesis analyses relevant techniques that have been successfully applied to classify sentiment polarity and it proposes a comparison of their performances based on experiments run on the dataset Sentiment140. Moreover, it proposes an analysis to understand when the models agree on the correct classification to highlight the margin of improvement that is possible to achieve in theory. Three main macro-categories of models are considered: traditional models based on mathematical theorems or intuitions (Naive Bayes, Support Vector Machine, Logistic Regression and Random Forest), neural models (ANN, CNN, Bi-LSTM and a hybrid approach) and classification services offered by top technology companies (AWS Comprehend, Google Natural Language API and Meaning Cloud). The tested models produced very similar performances, with the best model represented by Logistic Regression. Despite the potential of neural models and the advantages of ready-to-use services, traditional models proved to be the best trade-off and provided the best performances. Analyzing when the models agree, it was possible to observe that there is a subset of the dataset that is not correctly classified by any model, although in theory it is possible to achieve much better performances than those obtained by individual models.

Sono disponibili numerosi modelli e servizi per eseguire la Sentiment Analysis. Spesso è difficile scegliere quello giusto per il caso d'uso di interesse. Questa tesi analizza un sottoinsieme delle tecniche più utilizzate che sono state applicate con successo per classificare la polarità del sentiment e propone un confronto delle loro prestazioni sulla base di esperimenti condotti sul dataset denominato Sentiment140. Viene inoltre proposta un'analisi per capire quando i modelli concordano sulla corretta classificazione, al fine di evidenziare il margine di miglioramento che sarebbe possibile ottenere in teoria. Vengono considerate tre principali macro-categorie di modelli: modelli tradizionali basati su teoremi o intuizioni matematiche (Naive Bayes, Support Vector Machine, Logistic Regression e Random Forest), modelli neurali (ANN, CNN, Bi-LSTM e un approccio ibrido) e servizi di classificazione offerti da alcune tra le aziende più attive nel settore (AWS Comprehend, Google Natural Language API e Meaning Cloud). I modelli testati hanno generato prestazioni molto simili, con il miglior modello rappresentato dalla Logistic Regression. Nonostante il potenziale dei modelli neurali e la facilità di utilizzo dei servizi pronti all'uso, i modelli tradizionali si sono dimostrati il miglior compromesso e hanno fornito le migliori prestazioni. Infine, si è osservato che esiste un sottoinsieme del dataset che non è classificato correttamente da nessun modello. In teoria, sarebbe però possibile ottenere performance molto migliori di quelle ottenute dai singoli modelli.