POSTER: BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data

Genome sequences contain hundreds of millions of DNA base pairs. Finding the degree of similarity between two genomes requires executing a compute-intensive dynamic programming algorithm, such as Smith-Waterman. Traditional von Neumann architectures have limited parallelism and cannot provide an efficient solution for large-scale genomic data. Approximate heuristic methods (e.g. BLAST) are commonly used. However, they are suboptimal and still compute-intensive. In this work, we present BioSEAL, a biological sequence alignment accelerator. BioSEAL is a massively parallel non-von Neumann processing-in-memory architecture for large-scale DNA and protein sequence alignment. BioSEAL is based on resistive content addressable memory, capable of energy-efficient and high-performance associative processing. We present an associative processing algorithm for entire database sequence alignment on BioSEAL and compare its performance and power consumption with state-of-art solutions. We show that BioSEAL can achieve up to 57x speedup and 156x better energy efficiency, compared with existing solutions for genome sequence alignment and protein sequence database search.

[1]  Yiran Chen,et al.  Geometry variations analysis of TiO2 thin-film and spintronic memristors , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[2]  Levi A Garraway,et al.  Genomics-driven oncology: framework for an emerging paradigm. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[3]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[4]  Kyoung-Rok Cho,et al.  Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Yongchao Liu,et al.  SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[7]  Piotr Indyk,et al.  Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) , 2014, STOC.

[8]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[9]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[10]  Chuong B. Do,et al.  Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease , 2014, Nature Genetics.

[11]  R. Williams,et al.  Sub-nanosecond switching of a tantalum oxide memristor , 2011, Nanotechnology.

[12]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[13]  Srinivas Aluru,et al.  Parallel biological sequence alignments on the Cell Broadband Engine , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  Jan M. Rabaey,et al.  Exploring Hyperdimensional Associative Memory , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Francisco José Esteban,et al.  Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment , 2010, Bioinform..

[16]  Leon O. Chua,et al.  Circuit Elements With Memory: Memristors, Memcapacitors, and Meminductors , 2009, Proceedings of the IEEE.

[17]  Ran Ginosar,et al.  Resistive GP-SIMD Processing-In-Memory , 2016, ACM Trans. Archit. Code Optim..

[18]  Nishil Talati,et al.  Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC) , 2016, IEEE Transactions on Nanotechnology.

[19]  Ravishankar K. Iyer,et al.  ASAP: Accelerated Short-Read Alignment on Programmable Hardware , 2017, IEEE Transactions on Computers.

[20]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[21]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[22]  Ran Ginosar,et al.  PRINS: Processing-in-Storage Acceleration of Machine Learning , 2018, IEEE Transactions on Nanotechnology.

[23]  Wayne Luk,et al.  Hardware Acceleration of Genetic Sequence Alignment , 2013, ARC.

[24]  Yuan Xie,et al.  RADAR: A 3D-ReRAM based DNA Alignment Accelerator Architecture , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[25]  Engin Ipek,et al.  A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  J. Winn,et al.  Darwin , 1883, Nature.

[27]  Chris H. Kim,et al.  A Non-volatile Near-Memory Read Mapping Accelerator , 2017, ArXiv.

[28]  Ajay Joshi,et al.  Design and Optimization of Nonvolatile Multibit 1T1R Resistive RAM , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Gary D Stormo,et al.  An Introduction to Sequence Similarity (“Homology”) Searching , 2009, Current protocols in bioinformatics.

[30]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[31]  Ran Ginosar,et al.  A Resistive CAM Processing-in-Storage Architecture for DNA Sequence Alignment , 2017, IEEE Micro.

[32]  Henk Corporaal,et al.  Memristor based computation-in-memory architecture for data-intensive applications , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[33]  Jing Li,et al.  IMEC: A Fully Morphable In-Memory Computing Fabric Enabled by Resistive Crossbar , 2017, IEEE Computer Architecture Letters.

[34]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[35]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[36]  Engin Ipek,et al.  Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017 .

[37]  Ran Ginosar,et al.  Resistive Associative Processor , 2015, IEEE Computer Architecture Letters.

[38]  Yongchao Liu,et al.  SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[39]  Sudhir Kumar,et al.  MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment , 2004, Briefings Bioinform..

[40]  Lars Wienbrandt The FPGA-Based High-Performance Computer RIVYERA for Applications in Bioinformatics , 2014, CiE.

[41]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[42]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[43]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[44]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Xavier Martorell,et al.  CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters , 2016, IEEE Transactions on Parallel and Distributed Systems.

[46]  Kevin Skadron,et al.  Scaling with Design Constraints: Predicting the Future of Big Chips , 2011, IEEE Micro.

[47]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[48]  S. Bhunia,et al.  A Scalable Memory-Based Reconfigurable Computing Framework for Nanoscale Crossbar , 2012, IEEE Transactions on Nanotechnology.

[49]  Francisco José Esteban,et al.  Speeding-up Bioinformatics Algorithms with Heterogeneous Architectures: Highly Heterogeneous Smith-Waterman (HHeterSW) , 2016, J. Comput. Biol..

[50]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[51]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[52]  K. Pagiamtzis,et al.  Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.

[53]  Michael S. Farrar Optimizing Smith-Waterman for the Cell Broadband Engine , 2008 .

[54]  Mohsen Imani,et al.  Ultra-efficient processing in-memory for data intensive applications , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[55]  Ben H. H. Juurlink,et al.  The SARC Architecture , 2010, IEEE Micro.

[56]  Caxton C. Foster Content Addressable Parallel Processors , 1976 .

[57]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[58]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[59]  David A. Bader,et al.  BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[60]  Uri C. Weiser,et al.  MAGIC—Memristor-Aided Logic , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[61]  Ernst Houtgast,et al.  Heterogeneous hardware/software acceleration of the BWA-MEM DNA alignment algorithm , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[62]  Weiming Xia,et al.  Genomics of Alzheimer Disease: A Review. , 2016, JAMA neurology.

[63]  David Blaauw,et al.  GenAx: A Genome Sequencing Accelerator , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[64]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.