Anemone : a visual semantic graph. Scalable, from plain text	and multi-language

Information extraction has become a key factor to get the best of unstructured text. Although big enterprises have large amounts of information, it does not imply they also understand its meaning. The dimensions of these resources make it practically impossible to directly analyze them with the human eye because it just exceeds our capacities. This calls for the necessity to use machines to process and give us insights of the data and transform unstructured text into knowledge. In this thesis we present Anemone, a visualizable semantic graph that helps its users understand what happens in a large set of documents and explain how they relate to each other. We have build a scalable tool that accomplishes its goals in more than 28 languages. By stacking multiple layers in a structure that uses a native graph database as skeleton, our software is able to show which entities appear in the analyzed set of articles and automatically relate the documents by clustering them by topic similarity. Additionally, Anemone supports multiple natural language search queries that act as cumulative filters and can visually organize and narrow down the map of results. Our software re-invents the conventional paradigm of performing a “search query” to obtain a “ranking of results”, and takes it into a new level: a visual map of nodes that is organized upon multiple queries, giving additional information about where does the similarity come from. In this thesis we have documented the methodology to build this stack of layers with technologies such as Neo4j, Topic Modelling, Named Entity Recognition and Deep Learning Sentence Reformulation. From the outcomes of these previous sections, our research question aims to evaluate the usability and scalability of this artifact, which has been sucessful, and to assess the quality of our deep learning model in the task of performing sentence reformulations, which has not provided us with a definitive solution but has opened new directions to explore.

L'estrazione delle informazioni è diventata un fattore chiave per ottenere il meglio dal testo non strutturato. Sebbene le grandi imprese abbiano grandi quantità di informazioni, non implica che ne capiscano anche il significato. Le dimensioni di queste risorse gli rendono praticamente impossibile analizzarle direttamente con l'occhio umano perché superano le nostre capacità. Ciò richiede la necessità di utilizzare le macchine per elaborare e fornire informazioni sui dati e trasformare il testo non strutturato in conoscenza. In questa tesi presentiamo Anemone, un grafo semantico visualizzabile che aiuta i suoi utenti a capire cosa succede in un ampio set di documenti e spiega come l’informazione si relaziona. Abbiamo creato uno strumento scalabile che raggiunge i suoi obiettivi in oltre 28 lingue. Impilando più strati in una struttura che utilizza un database grafico nativo come scheletro, il nostro software è in grado di mostrare quali entità compaiono nell'insieme di articoli analizzati e di correlare automaticamente i documenti raggruppandoli per similarità di argomento. Inoltre, Anemone supporta query di ricerca in linguaggio naturale che agiscono come filtri cumulativi e possono organizzare visivamente e restringere la mappa dei risultati. Il nostro software reinventa il paradigma convenzionale di eseguire una "query di ricerca" per ottenere un "ranking dei risultati" e lo porta ad un nuovo livello: una mappa visiva dei nodi che è organizzata su più query, fornendo ulteriori informazioni su dove si trova la somiglianza. In questa tesi abbiamo documentato la metodologia per costruire questo stack di livelli con tecnologie come Neo4j, Topic Modeling, Named Entity Recognition e Deep Learning Sentence Reformulation. Dai risultati di queste sezioni precedenti, la nostra domanda di ricerca mira a valutare l'usabilità e la scalabilità di questo manufatto, che ha avuto successo, e valutare la qualità del nostro modello di Deep Learning nel compito di eseguire riformulazioni di frasi, che non ci ha fornito con una soluzione definitiva ma ha aperto nuove direzioni da esplorare.

Anemone : a visual semantic graph. Scalable, from plain text and multi-language

FICAPAL VILA, JOAN

2017/2018

Abstract

Information extraction has become a key factor to get the best of unstructured text. Although big enterprises have large amounts of information, it does not imply they also understand its meaning. The dimensions of these resources make it practically impossible to directly analyze them with the human eye because it just exceeds our capacities. This calls for the necessity to use machines to process and give us insights of the data and transform unstructured text into knowledge. In this thesis we present Anemone, a visualizable semantic graph that helps its users understand what happens in a large set of documents and explain how they relate to each other. We have build a scalable tool that accomplishes its goals in more than 28 languages. By stacking multiple layers in a structure that uses a native graph database as skeleton, our software is able to show which entities appear in the analyzed set of articles and automatically relate the documents by clustering them by topic similarity. Additionally, Anemone supports multiple natural language search queries that act as cumulative filters and can visually organize and narrow down the map of results. Our software re-invents the conventional paradigm of performing a “search query” to obtain a “ranking of results”, and takes it into a new level: a visual map of nodes that is organized upon multiple queries, giving additional information about where does the similarity come from. In this thesis we have documented the methodology to build this stack of layers with technologies such as Neo4j, Topic Modelling, Named Entity Recognition and Deep Learning Sentence Reformulation. From the outcomes of these previous sections, our research question aims to evaluate the usability and scalability of this artifact, which has been sucessful, and to assess the quality of our deep learning model in the task of performing sentence reformulations, which has not provided us with a definitive solution but has opened new directions to explore.

Scheda breve

Scheda completa

	Relatore
	
				PERNICI, BARBARA
			
	Correlatore/i
	
				GÖRNERUP, OLOF
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				25-lug-2018
			
	Anno accademico
	
				2017/2018
			
	Abstract in italiano
	
				L'estrazione delle informazioni è diventata un fattore chiave per ottenere il meglio dal testo non strutturato. Sebbene le grandi imprese abbiano grandi quantità di informazioni, non implica che ne capiscano anche il significato. Le dimensioni di queste risorse gli rendono praticamente impossibile analizzarle direttamente con l'occhio umano perché superano le nostre capacità. Ciò richiede la necessità di utilizzare le macchine per elaborare e fornire informazioni sui dati e trasformare il testo non strutturato in conoscenza. In questa tesi presentiamo Anemone, un grafo semantico visualizzabile che aiuta i suoi utenti a capire cosa succede in un ampio set di documenti e spiega come l’informazione si relaziona. Abbiamo creato uno strumento scalabile che raggiunge i suoi obiettivi in oltre 28 lingue. Impilando più strati in una struttura che utilizza un database grafico nativo come scheletro, il nostro software è in grado di mostrare quali entità compaiono nell'insieme di articoli analizzati e di correlare automaticamente i documenti raggruppandoli per similarità di argomento. Inoltre, Anemone supporta query di ricerca in linguaggio naturale che agiscono come filtri cumulativi e possono organizzare visivamente e restringere la mappa dei risultati. Il nostro software reinventa il paradigma convenzionale di eseguire una "query di ricerca" per ottenere un "ranking dei risultati" e lo porta ad un nuovo livello: una mappa visiva dei nodi che è organizzata su più query, fornendo ulteriori informazioni su dove si trova la somiglianza. In questa tesi abbiamo documentato la metodologia per costruire questo stack di livelli con tecnologie come Neo4j, Topic Modeling, Named Entity Recognition e Deep Learning Sentence Reformulation. Dai risultati di queste sezioni precedenti, la nostra domanda di ricerca mira a valutare l'usabilità e la scalabilità di questo manufatto, che ha avuto successo, e valutare la qualità del nostro modello di Deep Learning nel compito di eseguire riformulazioni di frasi, che non ci ha fornito con una soluzione definitiva ma ha aperto nuove direzioni da esplorare.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
JoanFicapalVilaMasterthesis.pdf accessibile in internet solo dagli utenti autorizzati Descrizione: Master Thesis: Anemone, a Visual Semantic Graph Dimensione 2.74 MB Formato Adobe PDF Visualizza/Apri	2.74 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/141788

Anemone : a visual semantic graph. Scalable, from plain text and multi-language

FICAPAL VILA, JOAN

2017/2018

Abstract

Scheda breve Scheda completa

----- Informazioni -----

Conferma cancellazione

Scheda breve

Scheda completa