Leveraging FPGAs for Accelerating Short Read Alignment

One of the key challenges facing genomics today is how to efficiently analyze the massive amounts of data produced by next-generation sequencing platforms. With general-purpose computing systems struggling to address this challenge, specialized processors such as the Field-Programmable Gate Array (FPGA) are receiving growing interest. The means by which to leverage this technology for accelerating genomic data analysis is however largely unexplored. In this paper, we present a runtime reconfigurable architecture for accelerating short read alignment using FPGAs. This architecture exploits the reconfigurability of FPGAs to allow the development of fast yet flexible alignment designs. We apply this architecture to develop an alignment design which supports exact and approximate alignment with up to two mismatches. Our design is based on the FM-index, with optimizations to improve the alignment performance. In particular, the <inline-formula><tex-math notation="LaTeX">$n$</tex-math><alternatives> <inline-graphic xlink:href="arram-ieq1-2535385.gif"/></alternatives></inline-formula>-step FM-index, index oversampling, a seed-and-compare stage, and bi-directional backtracking are included. Our design is implemented and evaluated on a 1U Maxeler MPC-X2000 dataflow node with eight Altera Stratix-V FPGAs. Measurements show that our design is 28 times faster than Bowtie2 running with 16 threads on dual Intel Xeon E5-2640 CPUs, and nine times faster than Soap3-dp running on an NVIDIA Tesla C2070 GPU.

[1]  Wayne Luk,et al.  Hardware Acceleration of Genetic Sequence Alignment , 2013, ARC.

[2]  Yongchao Liu,et al.  CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform , 2012, Bioinform..

[3]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  Juan C. Moure,et al.  n-step FM-Index for Faster Pattern Matching , 2013, ICCS.

[6]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[7]  Stefano Lonardi,et al.  String Matching in Hardware Using the FM-Index , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[8]  Carl Ebeling,et al.  Hardware Acceleration of Short Read Mapping , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[9]  Scott D. Kahn On the Future of Genomic Data , 2011, Science.

[10]  Wayne Luk,et al.  Ramethy: Reconfigurable Acceleration of Bisulfite Sequence Alignment , 2015, FPGA.

[11]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[12]  Stefano Lonardi,et al.  Multithreaded FPGA acceleration of DNA sequence mapping , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[13]  Thomas K. F. Wong,et al.  SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner , 2013, PloS one.

[14]  Siu-Ming Yiu,et al.  High Throughput Short Read Alignment via Bi-directional BWT , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[15]  Giovanni Manzini,et al.  An experimental study of an opportunistic index , 2001, SODA '01.

[16]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[17]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[18]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.