SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

MOTIVATION We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for computationally costly sequence alignment. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in finding the optimal path that connects two terminals with the least routing cost on a special grid layout that contains obstacles. The SneakySnake algorithm quickly solves the SNR problem and uses the found optimal path to decide whether or not performing sequence alignment is necessary. Reducing the ASM problem into SNR also makes SneakySnake efficient to implement on CPUs, GPUs, and FPGAs. RESULTS SneakySnake significantly improves the accuracy of pre-alignment filtering by up to four orders of magnitude compared to the state-of-the-art pre-alignment filters, Shouji, GateKeeper, and SHD. For short sequences, SneakySnake accelerates Edlib (state-of-the-art implementation of Myers's bit-vector algorithm) and Parasail (state-of-the-art sequence aligner with a configurable scoring function), by up to 37.7× and 43.9 × (>12× on average), respectively, with its CPU implementation, and by up to 413× and 689 × (>400× on average), respectively, with FPGA and GPU acceleration. For long sequences, the CPU implementation of SneakySnake accelerates Parasail and KSW2 (sequence aligner of minimap2) by up to 979 × (276.9× on average) and 91.7 × (31.7× on average), respectively. As SneakySnake does not replace sequence alignment, users can still obtain all capabilities (e.g., configurable scoring functions) of the aligner of their choice, unlike existing acceleration efforts that sacrifice some aligner capabilities. AVAILABILITY https://github.com/CMU-SAFARI/SneakySnake. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.

[1]  Rachata Ausavarungnirun,et al.  GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Onur Mutlu,et al.  Accelerating Genome Analysis: A Primer on an Ongoing Journey , 2020, IEEE Micro.

[3]  C. Alkan,et al.  Technology dictates algorithms: recent developments in read alignment , 2020, Genome Biology.

[4]  Onur Mutlu,et al.  Processing-in-memory: A workload-driven perspective , 2019, IBM J. Res. Dev..

[5]  Markus Schmidt,et al.  Accurate high throughput alignment via line sweep-based seed processing , 2019, Nature Communications.

[6]  Jason Cong,et al.  Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[7]  Rachata Ausavarungnirun,et al.  Processing Data Where It Makes Sense: Enabling In-Memory Computation , 2019, Microprocess. Microsystems.

[8]  Onur Mutlu,et al.  Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm , 2019, Bioinform..

[9]  Michael E. Saks,et al.  Approximating Edit Distance within Constant Factor in Truly Sub-Quadratic Time , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[10]  Onur Mutlu,et al.  Shouji: a fast and efficient pre-alignment filter for sequence alignment , 2018, Bioinform..

[11]  Onur Mutlu,et al.  SLIDER: Fast and Efficient Computation of Banded Sequence Alignment , 2018, ArXiv.

[12]  Mohammed Alser,et al.  Accelerating the Understanding of Life's Code Through Better Algorithms and Hardware Design , 2018, ArXiv.

[13]  Yuan Xie,et al.  RADAR: A 3D-ReRAM based DNA Alignment Accelerator Architecture , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[14]  2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) , 2018 .

[15]  Moses Charikar,et al.  On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings , 2018, ICALP.

[16]  M. Kasahara,et al.  Introducing difference recurrence relations for faster semi-global alignment of long sequences , 2018, BMC Bioinformatics.

[17]  Onur Mutlu,et al.  Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions , 2017, Briefings Bioinform..

[18]  E. Green,et al.  Prioritizing diversity in human genomics research , 2017, Nature Reviews Genetics.

[19]  Onur Mutlu,et al.  GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies , 2017, BMC Genomics.

[20]  Aniruddha Datta,et al.  A Survey of Software and Hardware Approaches to Performing Read Alignment in Next Generation Sequencing , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Wayne Luk,et al.  Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[23]  Onur Mutlu,et al.  GRIM-filter: fast seed filtering in read mapping using emerging memory technologies , 2017, 1708.04329.

[24]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[25]  C. Alkan,et al.  MAGNET: Understanding and Improving the Accuracy of Genome Pre-Alignment Filtering , 2017, 1707.01631.

[26]  Koji Nakano,et al.  Accelerating the Smith-Waterman Algorithm Using Bitwise Parallel Bulk Computation Technique on GPU , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[27]  Zou Dan,et al.  FPGASW: Accelerating Large-Scale Smith–Waterman Sequence Alignment Application with Backtracking on FPGA Linear Systolic Array , 2017, Interdisciplinary Sciences Computational Life Sciences.

[28]  Ravishankar K. Iyer,et al.  ASAP: Accelerated Short-Read Alignment on Programmable Hardware , 2017, IEEE Transactions on Computers.

[29]  Martin Sosic,et al.  Edlib: a C/C++ library for fast, exact sequence alignment using edit distance , 2016, bioRxiv.

[30]  Jason Cong,et al.  When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[31]  Masanori Hariyama,et al.  Hardware-Acceleration of Short-Read Alignment Based on the Burrows-Wheeler Transform , 2016, IEEE Transactions on Parallel and Distributed Systems.

[32]  Onur Mutlu,et al.  GateKeeper: a new hardware architecture for accelerating pre‐alignment in DNA short read mapping , 2016, Bioinform..

[33]  Azzedine Boukerche,et al.  Parallel Optimal Pairwise Biological Sequence Comparison , 2016, ACM Comput. Surv..

[34]  J. Daily Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments , 2016, BMC Bioinformatics.

[35]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[36]  Erman Ayday,et al.  Can you Really Anonymize the Donors of Genomic Data in Today's Digital World? , 2015, DPM/QASA@ESORICS.

[37]  Leonid Oliker,et al.  merAligner: A Fully Parallel Sequence Aligner , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[38]  Onur Mutlu,et al.  Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping , 2015, Bioinform..

[39]  Yongchao Liu,et al.  GSWABE: faster GPU‐accelerated sequence alignment with optimal alignment retrieval for short DNA sequences , 2015, Concurr. Comput. Pract. Exp..

[40]  Chao Wang,et al.  Accelerating the Next Generation Long Read Mapping with the FPGA-Based System , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  M. Emond,et al.  Accuracy of Next Generation Sequencing Platforms. , 2014, Next generation, sequencing & applications.

[42]  Srinivas Aluru,et al.  A Review of Hardware Acceleration for Computational Genomics , 2014, IEEE Design & Test.

[43]  Ryan Kastner,et al.  RIFFA 2.0: A reusable integration framework for FPGA accelerators , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[44]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[45]  C. Alkan,et al.  Accelerating read mapping with FastHASH , 2013, BMC Genomics.

[46]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[47]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[48]  José Ignacio Benavides Benítez,et al.  Performance models for asynchronous data transfers on consumer Graphics Processing Units , 2012, J. Parallel Distributed Comput..

[49]  Chuan Wang,et al.  Comparison of linear gap penalties and profile-based variable gap penalties in profile-profile alignments , 2011, Comput. Biol. Chem..

[50]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[51]  Francisco M. De La Vega,et al.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. , 2009, Genome research.

[52]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[53]  Alexandr Andoni,et al.  Approximating edit distance in near-linear time , 2009, STOC '09.

[54]  Giorgos Dimitrakopoulos,et al.  Low-Power Leading-Zero Counting and Anticipation Logic for High-Speed Floating Point Units , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[55]  Jarrod A. Roy,et al.  High-Performance Routing at the Nanometer Scale , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[56]  Funda Ergün,et al.  Oblivious string embeddings and edit distance approximations , 2006, SODA '06.

[57]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[58]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[59]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[60]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[62]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[63]  J. S. Lee,et al.  Use of steiner's problem in suboptimal routing in rectilinear metric , 1976 .

[64]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[65]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[66]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[67]  Babak Falsafi,et al.  FPGAs versus GPUs in Data centers , 2017, IEEE Micro.

[68]  Kiyoshi Asai,et al.  PBSIM: PacBio reads simulator - toward accurate genome assembly , 2013, Bioinform..

[69]  Chris C. N. Chu,et al.  FLUTE: Fast Lookup Table Based Rectilinear Steiner Minimal Tree Algorithm for VLSI Design , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[70]  H. T. Kung Why systolic architectures? , 1982, Computer.

[71]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .