Automatic news categorization for voice-based browsing of web sites through a conversational agent

The web is a huge source of information but unfortunately its content are not always equally accessible to people with disabilities. Visually impaired make use of Screen readers for browsing the web but websites non-compliant to accessibility rules and frequent changes in page layout often represent a challange for these users, making navigation slower and laborious. The aim of this thesis was to realize a voice-based conversational agent that allows people with visual disabilities to access and navigate informative contents of different news websites in a more direct, efficient and enjoyable way, without the need to be aware of the web page structure. Our approach for presenting news consist in clustering articles according to the similarity of their arguments and then present to the user only the most relevant title for each cluster. The importance of an article is determined based on their visual appearance on the page. With this content based approach we try to provide visually impaired with that "quick look" at the page that otherwise visually impaired could not have due to their condition. In this thesis work, we initially conducted a survey for better understand the needs and habits of people with visual disabilities. We then designed an algorithm for clustering articles in Italian language according to their contents and assign them a category. A first evaluation of the outcome of our clustering pipeline has been performed manually on an experimental set of 1046 documents, while a final evaluation was performed on sample clusters extracted from a real case data set of 11357 documents that were submitted to the evaluation of 74 volunteers. Results showed that obtained clusters have a good level of internal coherence and categories are well assigned. The conversational agent was realized on Amazon Alexa and was then tested by some blind users that gave us some encouraging feedbacks on the use of our Alexa skill and therefore the proposed approach.

Il web è un'immensa fonte di informazioni ma i suoi contenuti non sono sempre adeguatamente accessibili a persone con disabilità. I disabili visivi utilizzano gli screen reader per navigare il web ma il non rispetto delle regole di accessibilità e frequenti cambi nel layout della pagina spesso costituiscono un ostacolo per questi utenti, e rendono la navigazione lenta e laboriosa. L’obiettivo di questa tesi è stato quello di realizzare un assistente vocale che permettesse alle persone affette da disabilità visiva di accedere e navigare tra i contenuti informativi disponibili su diversi siti di notizie in modo più diretto, efficiente e piacevole, senza la necessità di conoscere la struttura del sito. Il nostro approccio per presentare gli articoli consiste nel clusterizzare gli articoli sulla base della somiglianza degli argomenti trattati e presentare all'utente solo l'articolo più rilevante per ogni cluster. L'importanza di un articolo è determinata da come appare visivamente all'interno della pagina. Con questo approccio basato sui contenuti si vuole cercare di offrire all’utente non vedente la possibilità di dare una "rapida occhiata" alla pagina che per via della sua condizione non può avere. Per comprendere meglio le esigenze e le abitudini delle persone con disabilità visiva durante la navigazione si siti di notizie, abbiamo inizialmente condotto un questionario. Successivamente abbiamo ideato un algoritmo per clusterizzare notizie in lingua italiana sulla base dei contenuti e assegnare loro una categoria. Una prima valutazione dei risultati del nostro algoritmo è stata eseguita manualmente su un insieme di test di 1046 documenti, mentre la valutazione finale è stata eseguita su alcuni campioni di cluster ottenuti da un insieme reale di 11357 documenti, che sono stati sottoposti alla valutazione di 74 volontari. I risultati hanno dimostrato che i cluster ottenuti sono generalmente internamenti coerenti e le categorie sono assegnate in modo corretto.