An automatic framework for schema mapping with query reverse engineering

Over the last few years more and more web-based information are characterizing our world. The major aspects of our life are described via data which are saved and stored in databases. These databases describe different domains and each one is composed by different data schemas. These schemas represent an event or an information of a specific domain, different schemas could depict the same data but in a different way. Implement more data integration systems is becoming a need since data are constantly evolving and schemas' volume is always increasing. Systems which applied to different schemas can provide a unified view of a specific domain that will give to the user a complete and reliable access to a domain. In our thesis we will propose a new framework for schema mapping, which is a critical step inside data integration. Schema mapping finds the relationships between the attributes of different schemas. Many of the existing frameworks used to create a schema mapping are thought for expert users who know how a database works and how to relate different schemas, that’s why those frameworks are built expecting some inputs from the user that could help to reach the goal. With the expansion of the fields on which data are used, nowadays we see an increased use and need of data integration system also by unsophisticated users. These users don’t know how a database works and for sure they will not be able to provide the inputs required by these frameworks. In this thesis we will present framework where schema mapping is achieved in a fully automatic way, preventing unsophisticated users to give inputs or to make reasoning on the results. Our frameworks uses the QRE algorithm, an algorithm which will return a set of queries that applied to a database will give as result the same tuples. Our framework has proven to be correct and complete on the datasets we have used since it returned the expected schema mapping.

Negli ultimi anni si è visto un incremento dell'utilizzo di informazioni digitali per descrivere il nostro mondo. I maggiori aspetti della nostra vita vengono descritti da dati che sono salvati su databases. Questi databases descrivono molte realtà e domini utilizzando molteplici schemi. Per un dominio possiamo avere diversi schemi che ne descrivono le informazioni ad esso legate, e questi schemi possono differire tra di loro ma allo stesso tempo rappresentare la stessa cosa. Considerando questo aspetto e il fatto che i dati continuano a cambiare e ad aumentare, + cresciuta la necessità di implementare sistemi in grado di uni care questi schemi per poter ottenere una visione unica di un certo dominio. Questa visione deve essere completa e affidabile per permettere ad un utente di poter accedere alle informazioni di un dominio. Nella nostra tesi presentiamo un framework utile ad ottenere uno schema mapping, uno degli aspetti fondamentali per poter trovare le relazioni tra i diversi schemi di uno stesso dominio. Molti dei sistemi utilizzati per schema mapping sono pensati per utenti esperti che sanno come interagire con un database, si aspettano degli input dall'utente per poter ottenere lo schema mapping. Considerando però l'incremento dell'utilizzo dei dati in ogni aspetto della nostra vita, è sempre più probabile che utenti meno esperti debbano interagire con i database e che richiedano quindi la necessità di utilizzare sistemi di schema mapping. Il nostro framework è pensato per essere completamente automatico, una volta ricevuti gli input iniziali ritornerà lo schema mapping finale senza dover chiedere ulteriori input all'utente, il quale potrà quindi essere sia esperto che meno. Per raggiungere il nostro scopo, nel nostro framework abbiamo utilizzato l'algoritmo di QRE, il quale ha come obbiettivo quello di ottenere un gruppo di query che applicate su un database ritornano gli stessi dati. Il nostro framework ha ottenuto come risultato lo schema mapping che ci aspettavamo venendo applicato sui datasets da noi usati, il quale era corretto e completo.