Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture

Current computer engineering evolves at an accelerated pace, with hardware advancing towards new chip multiprocessors (CMP) architectures and with supporting software gearing towards new programming and abstraction paradigms, to obtain the maximum efficiency of the hardware at a low cost. In this context, Tilera Corporation has developed a brand new CMP architecture with 64 cores (tiles) called Tile64, and has launched several Peripheral Component Interconnect Express (PCIe) cards to be used and monitored from a host Personal Computer (PC). These cards may execute parallel applications built in C/C++ and compiled with the Tile-GCC compiler. We have previously demonstrated the usefulness of the Tile64 architecture for bioinformatics [S. Galvez, D. Diaz, P. Hernandez, F.J. Esteban, J.A. Caballero, G. Dorado, Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment, Bioinformatics, 26 (2010) 683-686]. We have chosen a bioinformatics algorithm to test this many-core Tile64 architecture because of actual bioinformatics challenging needs: data-intensive workloads, space and time-consuming requirements and massive calculation. This algorithm, known as Needleman-Wunsch/Smith-Waterman (NW/SW), obtains an optimal sequence alignment in quadratic time and space cost, yet requires to be optimized to take full advantage of computing parallelization. In this paper we redesign, implement and fine-tune this algorithm, introducing key optimizations and changes that take advantage of specific Tile64 characteristics: RISC architecture, local tile's cache, length of memory word, shared memory usage, RAM file system, tile's intercommunication and job selection from a pool. The resulting algorithm - named MC64-NW/SW for Multicore64 Needleman-Wunsch/Smith-Waterman - achieves a gain of ~1000% when compared with the same algorithm on a x86 multi-core architecture. As far as we know, our NW/SW implementation is the fastest ever published for a standalone PC when aligning a pair of sequences larger than 20kb.

[1]  Donald Yeung,et al.  BioBench: A Benchmark Suite of Bioinformatics Applications , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[2]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[3]  Rudolf Eigenmann Toward a methodology of optimizing programs for high-performance computers , 1993, ICS '93.

[4]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[5]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[6]  Peter H. Sellers,et al.  An Algorithm for the Distance Between Two Finite Sequences , 1974, J. Comb. Theory, Ser. A.

[7]  Francisco José Esteban,et al.  Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment , 2010, Bioinform..

[8]  Patricia J. Teller,et al.  Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.

[9]  Michael Kistler,et al.  Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications , 2007, IPDPS.

[10]  Guang R. Gao,et al.  Whole Genome Alignment using a Multithreaded Parallel Implementation , 2001, Anais do XIII Simpósio de Arquitetura de Computadores e Processamento de Alto Desempenho (SBAC-PAD 2001).

[11]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[12]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[13]  Gerard R Lazo,et al.  The complete chloroplast genome sequence of Brachypodium distachyon: sequence comparison and phylogenetic analysis of eight grass plastomes , 2008, BMC Research Notes.

[14]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[15]  Alister Hamilton,et al.  9th International Work-Conference on Artificial Neural Networks , 2007 .

[16]  Azzedine Boukerche,et al.  A parallel strategy for biological sequence alignment in restricted memory space , 2008, J. Parallel Distributed Comput..

[17]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[19]  Jack J. Dongarra,et al.  Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor , 2009, Parallel Comput..

[20]  Cole Trapnell,et al.  Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..

[21]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[22]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[23]  E. G. Shpaer,et al.  Sensitivity and selectivity in protein similarity searches: a comparison of Smith-Waterman in hardware to BLAST and FASTA. , 1996, Genomics.

[24]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[25]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[26]  R. V. D. Wijngaart,et al.  Programming the Intel 80-core network-on-a-chip Terascale Processor , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[28]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[29]  Mark Baker,et al.  A comparative study of Java and C performance in two large-scale parallel applications , 2009 .

[30]  Ashwini K. Nanda,et al.  Cell/B.E. blades: Building blocks for scalable, real-time, interactive, and digital media servers , 2007, IBM J. Res. Dev..

[31]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[32]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 2003, J. Parallel Distributed Comput..

[33]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[34]  Webb Miller,et al.  A space-efficient algorithm for local similarities , 1990, Comput. Appl. Biosci..

[35]  Jens-Fabian Goetzmann,et al.  Massively Parallel Contact Simulation on Graphics Hardware using NVIDIA CUDA , 2008, Informatiktage.

[36]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[37]  Jonathan Schaeffer,et al.  FastLSA: a fast, linear-space, parallel and sequential algorithm for sequence alignment , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[38]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[39]  Tao Wang,et al.  Parallel Linear Space Algorithm for Large-Scale Sequence Alignment , 2005, Euro-Par.

[40]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[41]  M. Gonzalo Claros,et al.  Intuitive Bioinformatics for Genomics Applications: Omega-Brigid Workflow Framework , 2009, IWANN.

[42]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[43]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[44]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[45]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  R. Engelbrecht,et al.  DIGEST of TECHNICAL PAPERS , 1959 .

[47]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[48]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[49]  Jason N. Dale,et al.  Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..

[50]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[51]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[52]  Jonathan Schaeffer,et al.  FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment , 2006, Algorithmica.

[53]  Antonio Ruiz,et al.  Recognition of circular patterns on GPUs: Performance analysis and contributions , 2008, J. Parallel Distributed Comput..