The advent of Next Generation Sequencing (NGS) produced an explosion in the amount of genomic data generated, which resulted in the birth and early development of personalized medicine. However, the tools currently employed for the analysis of these data still require too much time and power. Thus, to boost the research in this field, new bioinformatic tools are needed, which can efficiently handle the vast amount of genomic data, in order to keep up with the pace of NGS technologies. In this scenario, the aim of this thesis is the design and the implementation of a memory-efficient, easy-to-use short sequence mapper, to be employed in various bioinformatic applications. At the core of the proposed tool there is an efficient implementation of a succinct data structure, allowing to compress the genomic data while still providing efficient queries on them. A comprehensive description of the data encoding scheme is presented in this work, together with the characterization of the proposed data structure in terms of memory utilization and execution time. To improve the performances and the energy efficiency of the sequence mapping process, this thesis also proposes a custom hardware design, which leverages the compression capability of the proposed data structure to fully exploit the highly parallel architecture of Field Programmable Gate Arrays (FPGAs). We employed such custom hardware architecture to develop BWaveR, a fast and power-efficient hybrid sequence mapper, which is made available through an intuitive web application that guarantees high usability and provides great user experience. Finally, this work provides a validation of the developed tool, in order to prove the correctness and reliability of the results it produces. Moreover, it presents an extensive evaluation of the performances of the proposed hybrid system, through a comparison with state-of-the-art equivalent software tools. The experimental results show that the proposed hardware architecture is able to provide application speed-up while significantly reducing the energy consumption. Thus, BWaveR constitues a valid solution for accelerating bioinformatic applications involving genomic sequence mapping, allowing users to benefit from hardware acceleration without any development effort or any knowledge of the underlying hardware architecture.
Lo sviluppo di tecnologie di sequenziamento sempre più efficienti, che prendono il nome di Next Generation Sequencing (NGS), ha portato ad un rapidissimo aumento della quantità di dati genomici disponibili, ponendo le basi per la nascita della medicina personalizzata. Purtroppo però, l'analisi di una tale quantità di dati richiede, ad oggi, ancora troppo tempo e troppa energia. Nuovi strumenti computazionali sono quindi necessari per accelerare la ricerca in questo campo, e per garantire lo sviluppo e la democratizzazione della medicina personalizzata. In tale contesto, lo scopo di questa tesi è la progettazione e la realizzazione di un tool per l'allineamento di sequenze genomiche, che risulti efficiente e facile da utilizzare in diverse applicazioni bioinformatiche. Il funzionamento di tale tool è basato su una struttura dati succinta, che permette di comprimere efficacemente i dati genomici, riducendo l'utilizzo di memoria e permettendo, allo stesso tempo, un rapido accesso a tali dati. In questo elaborato verrà fornita un'ampia descrizione della struttura dati proposta, del suo utilizzo, e delle sue caratteristiche in termini di utilizzo di memoria e tempi d'esecuzione. Questa tesi presenta anche la realizzazione di BWaveR, un sistema eterogeneo per l'allineamento di sequenze, che si avvale delle ridotte dimensioni della struttura dati proposta per sfruttare al meglio l'architettura parallela dei Field Programmable Gate Arrays (FPGAs). Tali dispositivi offrono vantaggi significativi dal punto di vista dell'efficienza energetica, ma risultano particolarmente difficili da programmare. BWaveR, invece, consente di sfruttare i vantaggi dell'accelerazione hardware attraverso una semplice ed intuitiva applicazione web, che garantisce un facile utilizzo ed una buona user experience. Il tool sviluppato è stato validato e valutato attreverso il confronto con applicativi software equivalenti. I test di validazione dimostrano l'affidabilità di BWaveR e provano la consistenza dei risultati prodotti. Inoltre, i risultati sperimentali mostrano che l'architettura hardware proposta è in grado di ridurre il tempo d'esecuzione ed il consumo energetico del processo di allineamento di sequenze genomiche. Pertanto, BWaveR si propone come una valida soluzione per accelerare una vasta gamma di applicazioni bioinformatiche, permettendo anche ad utenti senza competenze di programmazione di beneficiare dei vantaggi di un'architettura hardware specializzata.
BWaveR : an FPGA-accelerated genomic sequence mapper leveraging succinct data structures
Di DONATO, GUIDO WALTER
2018/2019
Abstract
The advent of Next Generation Sequencing (NGS) produced an explosion in the amount of genomic data generated, which resulted in the birth and early development of personalized medicine. However, the tools currently employed for the analysis of these data still require too much time and power. Thus, to boost the research in this field, new bioinformatic tools are needed, which can efficiently handle the vast amount of genomic data, in order to keep up with the pace of NGS technologies. In this scenario, the aim of this thesis is the design and the implementation of a memory-efficient, easy-to-use short sequence mapper, to be employed in various bioinformatic applications. At the core of the proposed tool there is an efficient implementation of a succinct data structure, allowing to compress the genomic data while still providing efficient queries on them. A comprehensive description of the data encoding scheme is presented in this work, together with the characterization of the proposed data structure in terms of memory utilization and execution time. To improve the performances and the energy efficiency of the sequence mapping process, this thesis also proposes a custom hardware design, which leverages the compression capability of the proposed data structure to fully exploit the highly parallel architecture of Field Programmable Gate Arrays (FPGAs). We employed such custom hardware architecture to develop BWaveR, a fast and power-efficient hybrid sequence mapper, which is made available through an intuitive web application that guarantees high usability and provides great user experience. Finally, this work provides a validation of the developed tool, in order to prove the correctness and reliability of the results it produces. Moreover, it presents an extensive evaluation of the performances of the proposed hybrid system, through a comparison with state-of-the-art equivalent software tools. The experimental results show that the proposed hardware architecture is able to provide application speed-up while significantly reducing the energy consumption. Thus, BWaveR constitues a valid solution for accelerating bioinformatic applications involving genomic sequence mapping, allowing users to benefit from hardware acceleration without any development effort or any knowledge of the underlying hardware architecture.File | Dimensione | Formato | |
---|---|---|---|
thesis_DiDonato.pdf
Open Access dal 08/04/2021
Descrizione: Final Thesis text
Dimensione
8.54 MB
Formato
Adobe PDF
|
8.54 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/165584