Power-efficiency analysis of accelerated BWA-MEM implementations on heterogeneous computing platforms

Next Generation Sequencing techniques have dramatically reduced the cost of sequencing genetic material, resulting in huge amounts of data being sequenced. The processing of this data poses huge challenges, both from a performance perspective, as well as from a power-efficiency perspective. Heterogeneous computing can help on both fronts, by enabling more performant and more power-efficient solutions. In this paper, power-efficiency of the BWA-MEM algorithm, a popular tool for genomic data mapping, is studied on two heterogeneous architectures. The performance and power-efficiency of an FPGA-based implementation using a single Xilinx Virtex-7 FPGA on the Alpha Data add-in card is compared to a GPU-based implementation using an NVIDIA GeForce GTX 970 and against the software-only baseline system. By offloading the Seed Extension phase on an accelerator, both implementations are able to achieve a two-fold speedup in overall application-level performance over the software-only implementation. Moreover, the highly customizable nature of the FPGA results in much higher power-efficiency, as the FPGA power consumption is less than one fourth of that of the GPU. To facilitate platform and tool-agnostic comparisons, the base pairs per Joule unit is introduced as a measure of power-efficiency. The FPGA design is able to map up to 44 thousand base pairs per Joule, a 2.1x gain in power-efficiency as compared to the software-only baseline.

[1]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[2]  Bertil Schmidt,et al.  Hyper customized processors for bio-sequence database scanning on FPGAs , 2005, FPGA '05.

[3]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[4]  Wayne Luk,et al.  Reconfigurable Acceleration of Short Read Mapping , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[5]  J. Zook,et al.  An analytical framework for optimizing variant discovery from personal genomes , 2015, Nature Communications.

[6]  M. J. Jaspers,et al.  Acceleration of read alignment with coherent attached FPGA coprocessors , 2015 .

[7]  Zaid Al-Ars,et al.  An Efficient GPU-Accelerated Implementation of Genomic Short Read Mapping with BWA-MEM , 2016 .

[8]  Ernst Houtgast,et al.  GPU-Accelerated BWA-MEM Genomic Mapping Algorithm Using Adaptive Load Balancing , 2016, ARCS.

[9]  Ernst Houtgast,et al.  An FPGA-based systolic array to accelerate the BWA-MEM genomic mapping algorithm , 2015, 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[10]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[11]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[12]  Ernst Houtgast,et al.  Heterogeneous hardware/software acceleration of the BWA-MEM DNA alignment algorithm , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[13]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[14]  Ernst Houtgast,et al.  An Efficient GPUAccelerated Implementation of Genomic Short Read Mapping with BWAMEM , 2017, CARN.

[15]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[16]  Jason Cong,et al.  A Novel High-Throughput Acceleration Engine for Read Alignment , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[17]  Guang R. Gao,et al.  Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform , 2007, HPRCTA.