In the Big Data era our society is characterized by an enormous increase of data stored, coming from a huge number of heterogeneous data sources. This heterogeneity makes the process of Data Integration very complex, that is why, nowadays many solutions are suggested to solve this problem. Throughout this thesis traditional solutions for data integration are analyzed and many strengths and weaknesses are pointed out. Analyzing and processing today’s data represents a challenge for developers which are incentivized to explore alternative methods with respect to the standard ones. As a result, around the beginning of the 2000s, the NoSQL movement ("not only SQL") became popular, precisely because of the need to guarantee atomicity, consistency and durability to data with more complex structures than those seen up to that point. In this work we will present an example of such NoSQL databases: Graph Database. We will point out their benefits, proving that they are capable of handling complex data better than the traditional approaches. In this work we analyzed two of such systems that allow integration through the use of graph and meta-data, and through their comparison their respective strenghts are pointed out. The first system is Business Intelligence with Integrated Instance Graphs (or BIIIG), a graph-based framework born to fulfill the need of enterprises for a deeper business intelligence. The second sytem we analyzed is a new Graph-based Meta Model for Heterogeneous Data Management that through the use of constraints is able to convert many existings datamodels into a unified meta model; providing a common formalism that can accomodate all kinds of data models. At the end we present the most promising methodologies used to address the Data Integration problem.
L'era dei Big Data, nella moderna società dell'informazione, è caratterizzata da un'enorme crescita della quantità di dati immagazzinati, derivanti dal grandissimo numero di fonti eterogenee oggigiorno disponibili. Questa eterogeneità rende il processo della Data Integration molto complesso e, attualmente, molte soluzioni cercano di rispondere a tale problema. In questa tesina sono state analizzate le soluzioni tradizionali per l'integrazione e ne sono stati evidenziati i punti di forza e i limiti. Analizzare ed elaborare la mole di dati disponibile al giorno d'oggi rappresenta un'importante sfida per gli sviluppatori che li spinge ad esplorare, sempre più frequentemente, nuove tecnologie. Intorno agli inizi degli anni 2000, prende piede il movimento NOSQL("not only SQL"), sorto appunto dall'esigenza di garantire atomicità, consistenza e durabilità ai dati con strutture più complesse rispetto a quelle viste sino a quel punto. Un esempio di questo paradigma sono i Graph Databases, capaci di gestire dati a struttura complessa con risultati migliori rispetto agli approcci tradizionali. In questo lavoro abbiamo analizzato due sistemi che consentono l'integrazione attraverso l'uso di grafi e di meta-dati, e attraverso il loro confronto abbiamo evidenziato i rispettivi punti di forza. Il primo sistema è Business Intelligence with Integrated Instance Graphs (o BIIIG), una struttura basata su grafi nata per soddisfare l'esigenza odierna delle imprese di una più accurata Business Intelligence. Successivamente abbiamo analizzato un nuovo modello per l'integrazione dei dati: GSMM; che, attraverso l'uso di vincoli converte i vari modelli di dati in un unico metamodello, fornendo un formalismo comune che permette di esprimere tutti i differenti modelli di dati. Infine abbiamo descritto le diverse metodologie usate nella Data Integration soffermandoci sui sistemi più promettenti.
An overview on state-of-the-art data integration systems with a focus on graph based methods
MARCHESINI, PAOLA
2018/2019
Abstract
In the Big Data era our society is characterized by an enormous increase of data stored, coming from a huge number of heterogeneous data sources. This heterogeneity makes the process of Data Integration very complex, that is why, nowadays many solutions are suggested to solve this problem. Throughout this thesis traditional solutions for data integration are analyzed and many strengths and weaknesses are pointed out. Analyzing and processing today’s data represents a challenge for developers which are incentivized to explore alternative methods with respect to the standard ones. As a result, around the beginning of the 2000s, the NoSQL movement ("not only SQL") became popular, precisely because of the need to guarantee atomicity, consistency and durability to data with more complex structures than those seen up to that point. In this work we will present an example of such NoSQL databases: Graph Database. We will point out their benefits, proving that they are capable of handling complex data better than the traditional approaches. In this work we analyzed two of such systems that allow integration through the use of graph and meta-data, and through their comparison their respective strenghts are pointed out. The first system is Business Intelligence with Integrated Instance Graphs (or BIIIG), a graph-based framework born to fulfill the need of enterprises for a deeper business intelligence. The second sytem we analyzed is a new Graph-based Meta Model for Heterogeneous Data Management that through the use of constraints is able to convert many existings datamodels into a unified meta model; providing a common formalism that can accomodate all kinds of data models. At the end we present the most promising methodologies used to address the Data Integration problem.File | Dimensione | Formato | |
---|---|---|---|
tesi.pdf
non accessibile
Descrizione: Testo della tesi
Dimensione
2.01 MB
Formato
Adobe PDF
|
2.01 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/147446