A framework for comparing open source sentiment analysis APIs

With the rise of social media the way business and customers interact has drastically changed. Business are expected to have a web presence and to produce content in order to engage with their user base on a daily base. In return, the users don't shy away from leaving feedback, and they leave a lot of it. Having that in mind, it is easy to see how having an automated way to tell if the user-generated content was negative or positive would provide a lot of value for a businesses. Coincidentally, in recent years a lot of research and advances have been made in the field of sentiment analysis and the efforts yielded a number of tools for predicting sentiment of textual content. This is why we were interested in examining the landscape of open source APIs that provide that functionality. In this thesis we have built a framework for assessing the performance of some open source APIs for sentiment analysis. The APIs were tested against a dataset of social media content generated by real fashion brands and their user base. Because of the global nature of the fashion industry the APIs were appraised on how well they perform in predicting sentiment of data in original language as well as their English translations. Finally, the framework improved the accuracy of procured sentiment predictions by taking into account the sentimental value of emojis and emoticons found in the data.

Con l'avvento dei social media, il modo in cui le aziende interagiscono con i propri clienti è drasticamente cambiato. Da un lato, le aziende devono necessariamente avere una presenza sul Web e produrre contenuti su base giornaliera in grado di coinvolgere i propri clienti e attrarne di nuovi. Dall'altro i clienti non hanno timore a rilasciare numerosi commenti sul Web e soprattutto nei social media. In questo scenario, è semplice intravedere del valore aggiunto per le aziende nei metodi automatici in grado di individuare se i contenuti generati dagli utenti hanno un'attitudine negativa o positiva. Infatti negli ultimi anni molte attività di ricerca sono state dedicate alla analisi del "sentiment" e hanno portato alla definizione di diversi strumenti automatici per predire il sentiment dei contenuti testuali rilasciati dagli utenti nel Web. Questo scenario ci ha spinto a esaminare con il nostro lavoro di tesi alcune API open source per la sentiment analysis. In particolare, la tesi riguarda la costruzione di un tool per valutare e confrontare API diverse. Il tool sviluppato ha permesso di applicare le API su un data set di commenti generati da aziende che realmente operano nel settore della moda e da utenti di social media (Facebook, Twitter) che fanno parte della fan base delle aziende. Vista la natura globale dell'industria della moda, le API sono state applicate sia ai post nel loro linguaggio originale, sia alla loro traduzione in Inglese. Infine, tramite il framework definito, abbiamo provato come l'accuratezza dell'analisi del sentiment possa migliorare se si tiene conto del significato degli emojis e degli emoticons presenti nei contenuti analizzati.