Deciphering DNA sequences is a fundamental step for almost every branches of biological research, especially since the human genome was first published in 2001. Despite tremendous progress of scientists, there are barriers of throughput, scalability, and speed that preclude them from obtaining the essential information they need. Next Generation Sequencing (NGS) is a modern approach to sequencing that can produce high-throughput, low cost data. This technique triggered numerous groundbreaking discoveries and is changing biological research. The group of professor Stefano Ceri, Politecnico di Milano, proposes a new paradigm for raising the level of abstraction in NGS data management with a Genometric Data Model (GDM) and GenoMetric Query Language (GMQL). As part of the research, my thesis is an experimental implementation of two complex operations JOIN and MAP of GMQL using the Apache Flink framework.
Implementation of a genomic operation using the Apache Flink framework
HOANG, THE VINH
2014/2015
Abstract
Deciphering DNA sequences is a fundamental step for almost every branches of biological research, especially since the human genome was first published in 2001. Despite tremendous progress of scientists, there are barriers of throughput, scalability, and speed that preclude them from obtaining the essential information they need. Next Generation Sequencing (NGS) is a modern approach to sequencing that can produce high-throughput, low cost data. This technique triggered numerous groundbreaking discoveries and is changing biological research. The group of professor Stefano Ceri, Politecnico di Milano, proposes a new paradigm for raising the level of abstraction in NGS data management with a Genometric Data Model (GDM) and GenoMetric Query Language (GMQL). As part of the research, my thesis is an experimental implementation of two complex operations JOIN and MAP of GMQL using the Apache Flink framework.File | Dimensione | Formato | |
---|---|---|---|
2015_07_Hoang.pdf
non accessibile
Descrizione: Thesis written report
Dimensione
539.81 kB
Formato
Adobe PDF
|
539.81 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/108648