Efficient Acceleration of the Pair-HMMs Forward Algorithm for GATK HaplotypeCaller on Graphics Processing Units

GATK HaplotypeCaller (HC) is a popular variant caller, which is widely used to identify variants in complex genomes. However, due to its high variants detection accuracy, it suffers from long execution time. In GATK HC, the pair-HMMs forward algorithm accounts for a large percentage of the total execution time. This article proposes to accelerate the pair-HMMs forward algorithm on graphics processing units (GPUs) to improve the performance of GATK HC. This article presents several GPU-based implementations of the pair-HMMs forward algorithm. It also analyzes the performance bottlenecks of the implementations on an NVIDIA Tesla K40 card with various data sets. Based on these results and the characteristics of GATK HC, we are able to identify the GPU-based implementations with the highest performance for the various analyzed data sets. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47× over existing GPU-based implementations.

[1]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[2]  Qiong Luo,et al.  GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration , 2011, 2011 International Conference on Parallel Processing.

[3]  Chris Rauer,et al.  Accelerating Genomics Research with OpenCL™ and FPGAs , 2017 .

[4]  Zaid Al-Ars,et al.  Maximizing systolic array efficiency to accelerate the PairHMM Forward Algorithm , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[5]  Zaid Al-Ars,et al.  Exploration of alternative GPU implementations of the pair-HMMs forward algorithm , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  Vlad Mihai Sima,et al.  FPGA acceleration of the pair-HMMs forward algorithm for DNA sequence analysis , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[8]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[9]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[10]  Moriyoshi Ohara,et al.  A power-efficient FPGA accelerator: Systolic array with cache-coherent interface for pair-HMM algorithm , 2016, 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX).

[11]  Deming Chen,et al.  Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling , 2017, FPGA.

[12]  Sanjay V. Rajopadhye,et al.  Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.