Parallel Optimal Pairwise Biological Sequence Comparison

Many bioinformatics applications, such as the optimal pairwise biological sequence comparison, demand a great quantity of computing resource, thus are excellent candidates to run in high-performance computing (HPC) platforms. In the last two decades, a large number of HPC-based solutions were proposed for this problem that run in different platforms, targeting different types of comparisons with slightly different algorithms and making the comparative analysis of these approaches very difficult. This article proposes a classification of parallel optimal pairwise sequence comparison solutions, in order to highlight their main characteristics in a unified way. We then discuss several HPC-based solutions, including clusters of multicores and accelerators such as Cell Broadband Engines (CellBEs), Field-Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs) and Intel Xeon Phi, as well as hybrid solutions, which combine two or more platforms, providing the actual landscape of the main proposals in this area. Finally, we present open questions and perspectives in this research field.

[1]  Weiguo Liu,et al.  XSW: Accelerating Biological Database Search on Xeon Phi , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[2]  E. C. Uberbacher,et al.  A multiple divide-and-conquer (MDC) algorithm for optimal alignments in linear space , 1994 .

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  Jim Jeffers,et al.  Chapter 10 – Linux on the Coprocessor , 2013 .

[5]  BoukercheAzzedine,et al.  Parallel Optimal Pairwise Biological Sequence Comparison , 2016 .

[6]  Srinivas Aluru,et al.  A Review of Hardware Acceleration for Computational Genomics , 2014, IEEE Design & Test.

[7]  Mile Šikić,et al.  SW#–GPU-enabled exact alignments on genome scale , 2013, Bioinform..

[8]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[9]  Francisco José Esteban,et al.  Direct approaches to exploit many-core architecture in bioinformatics , 2013, Future Gener. Comput. Syst..

[10]  Lars Wienbrandt,et al.  Bioinformatics Applications on the FPGA-Based High-Performance Computer RIVYERA , 2013 .

[11]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[12]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  J. Batley,et al.  A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome , 2014, Science.

[15]  Christophe Dessimoz,et al.  SWPS 3-fast multi-threaded vectorized , 2009 .

[16]  M. C. Schatz,et al.  The DNA data deluge , 2013, IEEE Spectrum.

[17]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[18]  H. T. Kung Why systolic architectures? , 1982, Computer.

[19]  Wayne Wolf,et al.  FPGA-Based System Design , 2004 .

[20]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[21]  Wu-chun Feng,et al.  Optimizing performance, cost, and sensitivity in pairwise sequence search on a cluster of PlayStations , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[22]  Azzedine Boukerche,et al.  A parallel strategy for biological sequence alignment in restricted memory space , 2008, J. Parallel Distributed Comput..

[23]  Torbjørn Rognes,et al.  Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation , 2011, BMC Bioinformatics.

[24]  Azzedine Boukerche,et al.  A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space , 2010, IEEE Transactions on Computers.

[25]  Scott F. Smith,et al.  Bioinformatics Application of a Scalable Supercomputer-On-Chip Architecture , 2003, PDPTA.

[26]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[27]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[28]  Bertil Schmidt,et al.  Reconfigurable architectures for bio-sequence database scanning on FPGAs , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[29]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[30]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[31]  Edans Flavius de Oliveira Sandes,et al.  Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[32]  Yongchao Liu,et al.  SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[33]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[34]  Ying Liu,et al.  A Highly Parameterized and Efficient FPGA-Based Skeleton for Pairwise Biological Sequence Alignment , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[35]  Guang R. Gao,et al.  Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform , 2007, HPRCTA.

[36]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[37]  Jaideep Singh,et al.  Accelerating Smith-Waterman on Heterogeneous CPU-GPU Systems , 2011, 2011 5th International Conference on Bioinformatics and Biomedical Engineering.

[38]  Edans Flavius de Oliveira Sandes,et al.  Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU , 2013, IEEE Trans. Parallel Distributed Syst..

[39]  Bertil Schmidt,et al.  An adaptive grid implementation of DNA sequence alignment , 2005, Future Gener. Comput. Syst..

[40]  Holger Scherl Cell Broadband Engine Architecture , 2011 .

[41]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[42]  Azzedine Boukerche,et al.  Exact Parallel Alignment of megabase genomic Sequences with Tunable Work Distribution , 2012, Int. J. Found. Comput. Sci..

[43]  Jo McEntyre,et al.  The NCBI Handbook , 2002 .

[44]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[45]  Mateo Valero,et al.  Scalable multicore architectures for long DNA sequence comparison , 2011, Concurr. Comput. Pract. Exp..

[46]  Partha Pratim Pande,et al.  Hardware accelerators for biocomputing: A survey , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[47]  Yongchao Liu,et al.  SWAPHI: Smith-waterman protein database search on Xeon Phi coprocessors , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[48]  Octavio Nieto-Taladriz,et al.  Fpga Acceleration for DNA Sequence Alignment , 2007, J. Circuits Syst. Comput..

[49]  Daniel P. Lopresti,et al.  FPGA Implementation of Systolic Sequence Alignment , 1992, FPL.

[50]  Jiri Vohradsky,et al.  Supervised inference of gene-regulatory networks , 2008, BMC Bioinformatics.

[51]  Yongchao Liu,et al.  GSWABE: faster GPU‐accelerated sequence alignment with optimal alignment retrieval for short DNA sequences , 2015, Concurr. Comput. Pract. Exp..

[52]  Eduard Ayguadé,et al.  CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[53]  Armando De Giusti,et al.  Smith-Waterman algorithm on heterogeneous systems: A case study , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[54]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[55]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .

[56]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.

[57]  Partha Pratim Pande,et al.  Network-on-Chip Hardware Accelerators for Biological Sequence Alignment , 2010, IEEE Transactions on Computers.

[58]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[59]  Alba Cristina Magalhaes Alves de Melo,et al.  Biological Sequence Comparison on Hybrid Platforms with Dynamic Workload Adjustment , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[60]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.

[61]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[62]  Todd Mytkowicz,et al.  Parallelizing dynamic programming through rank convergence , 2014, PPoPP.

[63]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[64]  V. Chvátal,et al.  Longest common subsequences of two random sequences , 1975, Advances in Applied Probability.

[65]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[66]  Wu-chun Feng,et al.  Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine , 2008, CF '08.

[67]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[68]  Jonathan Schaeffer,et al.  FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment , 2006, Algorithmica.

[69]  Srinivas Aluru,et al.  Parallel Genomic Alignments on the Cell Broadband Engine , 2009, IEEE Transactions on Parallel and Distributed Systems.

[70]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[71]  James W. Fickett,et al.  Fast optimal alignment , 1984, Nucleic Acids Res..

[72]  Fumihiko Ino,et al.  Sequence Homology Search Using Fine Grained Cycle Sharing of Idle GPUs , 2012, IEEE Transactions on Parallel and Distributed Systems.

[73]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[74]  Xiandong Meng,et al.  A High-Performance Heterogeneous Computing Platform for Biological Sequence Analysis , 2010, IEEE Transactions on Parallel and Distributed Systems.

[75]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[76]  Journal of Molecular Biology , 1959, Nature.

[77]  Srinivas Aluru,et al.  Space and time optimal parallel sequence alignments , 2004, IEEE Transactions on Parallel and Distributed Systems.

[78]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[79]  Fumihiko Ino,et al.  Harnessing the power of idle GPUs for acceleration of biological sequence alignment , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.