In the last decades we witnessed the birth and growth of the new fi eld of genomics, thanks to the recent opportunities given by high throughput DNA sequencing. In recent years, the quantity and quality of the data produced have augmented, while the cost of sequencing is dropping; experts foresee that it will soon be possible to obtain the full genome of a person for less than 1000$. There are many algorithms and dedicated tools to e fficiently solve specifi c problems in the fields, but there is a notable lack in standards and systems to query heterogeneous genomic data. This thesis presents the fi rst results of the efforts of the recently born Genomic Computing Group at Politecnico di Milano. We present a novel approach to design and manage genomic data, starting from a modern Laboratory Information Management System. We did not want to interfere with biologists' pipelines; instead we modeled the information obtained at the end of the most common work ows with a unique model, the Genomic Data Model, or GDM. We present the GenoMetric Query Language, or GMQL, a novel high level system for querying our GDM. GMQL can be used to manage and a retrieve information over vast repositories of genomic data, making integration of different sources possible and easy. We will the details of the first implementation of GMQL and its core algorithms, and we also give some insight of the most recent development of the second version of the system.

In the last decades we witnessed the birth and growth of the new fi eld of genomics, thanks to the recent opportunities given by high throughput DNA sequencing. In recent years, the quantity and quality of the data produced have augmented, while the cost of sequencing is dropping; experts foresee that it will soon be possible to obtain the full genome of a person for less than 1000$. There are many algorithms and dedicated tools to e fficiently solve specifi c problems in the fields, but there is a notable lack in standards and systems to query heterogeneous genomic data. This thesis presents the fi rst results of the efforts of the recently born Genomic Computing Group at Politecnico di Milano. We present a novel approach to design and manage genomic data, starting from a modern Laboratory Information Management System. We did not want to interfere with biologists' pipelines; instead we modeled the information obtained at the end of the most common work ows with a unique model, the Genomic Data Model, or GDM. We present the GenoMetric Query Language, or GMQL, a novel high level system for querying our GDM. GMQL can be used to manage and a retrieve information over vast repositories of genomic data, making integration of different sources possible and easy. We will the details of the first implementation of GMQL and its core algorithms, and we also give some insight of the most recent development of the second version of the system.

Modeling and querying genomic data

VENCO, FRANCESCO

Abstract

In the last decades we witnessed the birth and growth of the new fi eld of genomics, thanks to the recent opportunities given by high throughput DNA sequencing. In recent years, the quantity and quality of the data produced have augmented, while the cost of sequencing is dropping; experts foresee that it will soon be possible to obtain the full genome of a person for less than 1000$. There are many algorithms and dedicated tools to e fficiently solve specifi c problems in the fields, but there is a notable lack in standards and systems to query heterogeneous genomic data. This thesis presents the fi rst results of the efforts of the recently born Genomic Computing Group at Politecnico di Milano. We present a novel approach to design and manage genomic data, starting from a modern Laboratory Information Management System. We did not want to interfere with biologists' pipelines; instead we modeled the information obtained at the end of the most common work ows with a unique model, the Genomic Data Model, or GDM. We present the GenoMetric Query Language, or GMQL, a novel high level system for querying our GDM. GMQL can be used to manage and a retrieve information over vast repositories of genomic data, making integration of different sources possible and easy. We will the details of the first implementation of GMQL and its core algorithms, and we also give some insight of the most recent development of the second version of the system.
BONARINI, ANDREA
PERNICI, BARBARA
15-gen-2016
In the last decades we witnessed the birth and growth of the new fi eld of genomics, thanks to the recent opportunities given by high throughput DNA sequencing. In recent years, the quantity and quality of the data produced have augmented, while the cost of sequencing is dropping; experts foresee that it will soon be possible to obtain the full genome of a person for less than 1000$. There are many algorithms and dedicated tools to e fficiently solve specifi c problems in the fields, but there is a notable lack in standards and systems to query heterogeneous genomic data. This thesis presents the fi rst results of the efforts of the recently born Genomic Computing Group at Politecnico di Milano. We present a novel approach to design and manage genomic data, starting from a modern Laboratory Information Management System. We did not want to interfere with biologists' pipelines; instead we modeled the information obtained at the end of the most common work ows with a unique model, the Genomic Data Model, or GDM. We present the GenoMetric Query Language, or GMQL, a novel high level system for querying our GDM. GMQL can be used to manage and a retrieve information over vast repositories of genomic data, making integration of different sources possible and easy. We will the details of the first implementation of GMQL and its core algorithms, and we also give some insight of the most recent development of the second version of the system.
Tesi di dottorato
File allegati
File Dimensione Formato  
tesi.pdf

accessibile in internet per tutti

Descrizione: thesis text
Dimensione 2.87 MB
Formato Adobe PDF
2.87 MB Adobe PDF Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/115522