Understanding the online behavior of bots : a systematic code review

Bots, i.e., algorithmically driven entities that behave like humans in online communications, are increasingly infiltrating social conversations on the Web. If not properly prevented, this presence of bots may cause harm to the humans they interact with. This thesis aims to understand which types of abuse may lead to harm and whether these can be considered intentional or not. To do so, I first analyse the role in the phenomenon of some companies providing online services of social networking, instant messaging, and similar, analizzando the rule they introduce and how they support development of applications and bots. I retrieve a dataset of repositories from Github, and I derive from description of those repositories which actions bots implement. I then manually review a dataset of 60 Twitter bot code repositories on GitHub, derive a set of potentially abusive actions, characterize them using a taxonomy of abstract code patterns, and assess the potential abusiveness of the patterns. Lastly, the thesis describes the design and implementation of a code pattern recognizer and uses the pattern recognizer to automatically analyze a dataset of 786 Python bot code repositories. The study does not only reveal the existence of 28 communication-specific code patterns – which could be used to assess the harmfulness of bot code – but also their consistent presence throughout all studied repositories.

I bot, cioè entità guidate algoritmicamente che si comportano come gli essere umani in comunicazioni online, si stanno infiltrando sempre di più nelle conversazioni sociali sul Web. Se non controllata adeguatamente, la presenza dei bot può causare danni agli essere umani con i quali i bot interagiscono. Lo scopo di questa tesi è quello di capire quali tipi di abusi possano portare ad un danno e se il danno possa essere considerato intenzionale o meno. Per fare ciò, per prima cosa analizzo il ruolo di alcune compagnie che offrono servizi di reti sociali, messaggistica istantanea e simili, analizzando le regole che pongono e in che modo supportano lo sviluppo di bot e applicazioni. Dopodichè, recupero un dataset di repository da Github, e dalle descrizioni di queste repository derivo quali sono le azioni implementate dai bot. In seguito, esamino manualmente un dataset di 60 repository contenenti codice di bot per Twitter, derivo un insieme di azioni potenzialmente causa di abusi, li caratterizzo usando una tassonomia di pattern di codice e valuto la potenziale abusività dei pattern. Infine, questa tesi descrive la progettazione e l'implementazione di un riconoscitore di pattern di code e usa il riconoscitore di pattern per analizzare in maniera automatica un dataset di 786 repository contenenti codice di bot scritto in Python. Lo studio rivela non solo l'esistenza di 28 pattern specifici per la comunicazione - i quali potrebbero essere usati per valutare la dannosità del codice di un bot - ma anche la loro consistente presenza nelle repository studiate