FPGA-based PairHMM forward algorithm for DNA variant calling

Next Generation Sequencing (NGS) technologies have allowed to drastically reduce costs and time associated with human genome sequencing. As a result, we now have access to a huge amount of genetic data that is fundamental for the research in the field of personalized medicine. By analyzing these data it is in fact possible to determine the genetic variations existing in the DNA of an invidual that can potentially cause the onset of a specific disease, either inherited or acquired. This information can therefore be used as biological markers on the basis of which personalized diagnostic methods and drug treatments are developed. However, in order to identify significant associations, it is necessary to analyze a large number of samples and controls, resulting in extremely long processing times. Therefore, the main limit of the whole process is not represented by data collection but rather by their processing, which is inefficient even for a limited number of subjects. The most commonly used methodology for identifying genetic variants from sequencing data is known as variant calling. Among the various software that implement this procedure, the most widespread is Genome Analysis ToolKit (GATK). Specifically, the variant calling procedure is carried out in GATK through two tools, the HaplotypeCaller (HC) and MuTect2, which allow the identification of inherited and somatic variants respectively. Although GATK proposes optimized algorithms and consolidated pipelines for the analysis of genetic data, it is evident that a strongly software-based implementation does not offer adequate performance to the workloads to which it is currently subjected. In this scenario, the use of Heterogeneous System Architectures (HSAs) represents a spontaneous technological evolution, especially in view of the application of these methodologies in common clinical practices. Indeed, by using these computing systems, we have access to a relevant number of heterogeneous processing resources and we can decide to offload the computational hotspots of the application to the most appropriate one. The development of dedicated hardware architectures that complement software implementations requires an accurate study of the problem, whose aim is identifying the operations that mostly affect the performance of the application and that are better suited to be parallelized and, consequently, accelerated. In the specific case of variant calling, the Pair Hidden Markov Model (PairHMM) Forward Algorithm (FA) absorbs 70% of the total execution time, and given its highly parallel structure, is the object of the hardware implementation proposed in this work. The chosen hardware accelerator is based on Field Programmable Gate Array (FPGA), as this technology allows to achieve high performance while maintaining a relatively low power profile. Specifically, although our FPGA-based implementation does not match the performance achieved by other works in the state of the art in terms of execution time, thanks its power efficiency, it is able to achieve a significant improvement in terms of performance over power consumption with respect to highly parallelized software implementations and GPU-based designs.

Le tecnologie di sequenziamento di nuova generazione (NGS) hanno permesso di ridurre drasticamente i costi e i tempi necessari a sequenziare il genoma umano. Di conseguenza, abbiamo oggi accesso ad un’enorme quantità di dati genetici che sono fondamentali per la ricerca nell’ambito della medicina personalizzata. Analizzando tali dati è infatti possibile determinare le variazioni genetiche esistenti nel DNA di un inviduo che potenzialmente possono causare l’insorgenza di una determinata malattia, sia essa ereditaria o acquisita. Queste informazioni possono quindi essere utilizzate come marcatori biologici sulla base dei quali vengono sviluppati metodi di diagnosi e trattamenti farmacologici personalizzati. Tuttavia, per individuare delle associazioni significative è necessario analizzare un numero elevato di campioni e controlli, determinando tempi di elaborazione estremamente lunghi. Attualmente, il limite principale dell’intero processo non è quindi rappresentato dalla raccolta dei dati ma dalla loro elaborazione, che risulta inefficiente anche per un numero limitato di soggetti. La metodologia più comunemente utilizzata per individuare varianti genetiche a partire da dati di sequenziamento è nota con il nome di variant calling. Tra i diversi software che implementano questa procedura, il più diffuso è Genome Analysis ToolKit (GATK). Nello specifico la procedura di variant calling è realizzata in GATK attraverso due tool, l’HaplotypeCaller (HC) e MuTect2 che permettono l’individuazione di varianti ereditarie e somatiche rispettivamente. Sebbene GATK proponga algoritmi ottimizzati e pipeline consolidate per l’analisi di dati genetici, è evidente che un’implementazione basata su software non offra performance adeguate agli attuali carichi di lavoro. In questo scenario, l’utilizzo di architetture di sistemi eterogenei (HSA) rappresenta un’evoluzione tecnologica spontanea. Infatti, utilizzando queste architetture, abbiamo a disposizione risorse computazionale eterogenee e possiamo scegliere la più adatta per l’implementazione degli step più computazionalmente intensivi dell’applicazione. Lo sviluppo di architetture hardware dedicate che complementino implementazioni software richiede uno studio accurato del problema, volto ad identificare le operazioni che maggiormente gravano sulle performance dell’applicazione e che meglio si prestano ad essere parallelizzate e, di conseguenza, accelerate. Nel caso specifico del variant calling, il Pair Hidden Markov Model (PairHMM) Forward Algorithm (FA) assorbe il 70% del tempo totale di esecuzione, e data la sua struttura altamente parallela, costituisce l’oggetto dell’implementazione hardware proposta in questo lavoro. L’acceleratore hardware che abbiamo scelto è basato su Field Programmable Gate Array (FPGA), in quanto questa tecnologia consente di raggiungere performance elevate pur mantenendo un profilo di potenza relativamente basso. In particolare, nonostante le nostra implementazione basata su FPGA non raggiunga le prestazioni di altri lavori nello stato dell’arte in termini di tempi di esecuzione, è in grado di ottenere significativi miglioramenti in termini di performance su consumo di potenza.