Implementation of Hybrid Alignment Algorithm for Protein Database Search on the SW26010 Many-Core Processor

In biological research, biology sequence alignment algorithm aims to find similarities between sequences. As the size of biological database increases exponentially, the complexity of sequence alignment process also increases rapidly, which results in a large amount of computational time. The Sunway TaihuLight is the world’s first heterogeneous supercomputer with peak performance over 100 PFlops and provides a new hardware platform for database search. In this paper we present an efficient method of protein database search based on Sunway TaihuLight supercomputer. Furthermore, we also optimize protein database search on Sunway TaihuLight to give full play to the performance of the SW26010 processor. In our proposed approach, we design hybrid sequence alignment by combining the Smith-Waterman local alignment algorithm and the Needleman-Wunsch global alignment algorithm. The protein database search is paralleled by message passing interface (MPI) and accelerated thread library (Athread). Experiment results with the Swiss-Prot database show that our implementation can effectively leverage the SW26010 processor’s special hardware architecture and achieve a speedup to 15.91 times on a single node. In addition, we expand the scale to 64 nodes to test the scalability of the parallel method on the Sunway TaihuLight system, and the results show that our parallel implementation of protein database search have a good expansibility and reliability.

[1]  Kenli Li,et al.  Implementation and Optimization of AES Algorithm on the Sunway TaihuLight , 2016, 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT).

[2]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[3]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[4]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.

[5]  Kenli Li,et al.  Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer , 2019, IEEE Transactions on Parallel and Distributed Systems.

[6]  Weiguo Liu,et al.  Bio-sequence database scanning on a GPU , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[7]  Valery Polyanovsky,et al.  Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences , 2011, Algorithms for Molecular Biology.

[8]  T. R. P. Siriwardena,et al.  Accelerating global sequence alignment using CUDA compatible multi-core GPU , 2010, 2010 Fifth International Conference on Information and Automation for Sustainability.

[9]  Wei Ge,et al.  The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.

[10]  Yongchao Liu,et al.  SWAPHI: Smith-waterman protein database search on Xeon Phi coprocessors , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[11]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[12]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[13]  Armando De Giusti,et al.  Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA , 2017, IWBBIO.

[14]  Jonas S. Almeida,et al.  Alignment-free sequence comparison: benefits, applications, and tools , 2017, Genome Biology.

[15]  Guoqing Xiao,et al.  Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight , 2019, Neural Computing and Applications.

[16]  Wei Zhou,et al.  Protein database search of hybrid alignment algorithm based on GPU parallel acceleration , 2017, The Journal of Supercomputing.

[17]  Jacek Blazewicz,et al.  Protein alignment algorithms with an efficient backtracking routine on multiple GPUs , 2011, BMC Bioinformatics.

[18]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[19]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[20]  Bin Sun,et al.  A multi‐GPU protein database search model with hybrid alignment manner on distributed GPU clusters , 2018, Concurr. Comput. Pract. Exp..

[21]  Tao Li,et al.  CASpMV: A Customized and Accelerative SpMV Framework for the Sunway TaihuLight , 2021, IEEE Transactions on Parallel and Distributed Systems.

[22]  Peng Zhang,et al.  Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[23]  Armando Eduardo De Giusti,et al.  OSWALD: OpenCL Smith–Waterman on Altera’s FPGA for Large Protein Databases , 2018 .

[24]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[25]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[26]  Gabor T. Marth,et al.  SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications , 2012, PloS one.

[27]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .