RAMPS: A Reconfigurable Architecture for Minimal Perfect Sequencing

The alignment of many short sequences of DNA, called reads, to a long reference genome is a common task in molecular biology. When the problem is expanded to handle typical workloads of billions of reads, execution time becomes critical. In this paper we present a novel reconfigurable architecture for minimal perfect sequencing (RAMPS). While existing solutions attempt to align a high percentage of the reads using a small memory footprint, RAMPS focuses on performing fast exact matching. Using the human genome as a reference, RAMPS aligns short reads hundreds of thousands of times faster than current software implementations such as SOAP2 or Bowtie, and about a thousand times faster than GPU implementations such as SOAP3. Whereas other aligners require hours to preprocess reference genomes, RAMPS can preprocess the reference human genome in a few minutes, opening the possibility of using new reference sources that are more genetically similar to the newly sequenced data.

[1]  Milad Gholami,et al.  Fast CPU-based DNA exact sequence aligner , 2012, Tenth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMCODE2012).

[2]  Stefano Lonardi,et al.  FHAST: FPGA-Based Acceleration of Bowtie in Hardware , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[4]  Phillip H. Jones,et al.  Shepard: A fast exact match short read aligner , 2012, Tenth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMCODE2012).

[5]  M. C. Schatz,et al.  The DNA data deluge , 2013, IEEE Spectrum.

[6]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[7]  Masanori Hariyama,et al.  FPGA­Accelerator for DNA Sequence Alignment Based on an Ef ficient Data­Dependent Memory Access Scheme , 2014 .

[8]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[9]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[10]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[11]  Masanori Hariyama,et al.  Hardware-Acceleration of Short-Read Alignment Based on the Burrows-Wheeler Transform , 2016, IEEE Transactions on Parallel and Distributed Systems.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  Martin Dietzfelbinger,et al.  Hash, Displace, and Compress , 2009, ESA.

[14]  Kimmo Fredriksson,et al.  Simple Compression Code Supporting Random Access and Fast String Matching , 2007, WEA.

[15]  Srinivas Aluru,et al.  A Review of Hardware Acceleration for Computational Genomics , 2014, IEEE Design & Test.

[16]  Andrés Tomás,et al.  Using GPUs for the Exact Alignment of Short-Read Genetic Sequences by Means of the Burrows-Wheeler Transform , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[18]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[19]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[20]  J.D. Watson,et al.  Reprint: Molecular Structure of Nucleic Acids , 2003, Annals of Internal Medicine.

[21]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[22]  Stephen A. Edwards,et al.  MEMOCODE 2012 hardware/software codesign contest: DNA sequence aligner , 2012, Tenth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMCODE2012).

[23]  Tony M. Brewer,et al.  Instruction Set Innovations for the Convey HC-1 Computer , 2010, IEEE Micro.

[24]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[25]  Wen Tang,et al.  Accelerating Irregular Computation in Massive Short Reads Mapping on FPGA Co-Processor , 2016, IEEE Transactions on Parallel and Distributed Systems.

[26]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[27]  Liqing Zhang,et al.  GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.

[28]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[29]  Fabiano C. Botelho,et al.  Near-Optimal Space Perfect Hashing Algorithms , 2009 .

[30]  Rainer G. Spallek,et al.  Next-generation massively parallel short-read mapping on FPGAs , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.