A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

Bioinformatics is an interdisciplinary field that applies trending techniques in information technology, mathematics, and statistics in studying large biological data. Bioinformatics involves several computational techniques such as sequence and structural alignment, data mining, macromolecular geometry, prediction of protein structure and gene finding. Protein structure and sequence analysis are vital to the understanding of cellular processes. Understanding cellular processes contributes to the development of drugs for metabolic pathways. Protein sequence alignment is concerned with identifying the similarities and the relationships among different protein structures. In this paper, we target two well-known protein sequence alignment algorithms, the Needleman–Wunsch and the Smith–Waterman algorithms. These two algorithms are computationally expensive which hinders their applicability for large data sets. Thus, we propose a hybrid parallel approach that combines the capabilities of multi-core CPUs and the power of contemporary GPUs, and significantly speeds up the execution of the target algorithms. The validity of our approach is tested on real protein sequences. Moreover, the scalability of the approach is verified on randomly generated sequences with predefined similarity levels. The results showed that the proposed hybrid approach was up to 242 times faster than the sequential approach.

[1]  Max Grossman,et al.  Professional CUDA C Programming , 2014 .

[2]  Chun-Yuan Lin,et al.  Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units , 2014, Int. J. Comput. Sci. Eng..

[3]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[4]  Tamás Budavári,et al.  Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space , 2014 .

[5]  R. Sinden DNA Structure and Function , 1994 .

[6]  Anders Eklund,et al.  Medical image processing on the GPU - Past, present and future , 2013, Medical Image Anal..

[7]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[8]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[9]  Piotr Gawron,et al.  G-DNA - a highly efficient multi-GPU/MPI tool for aligning nucleotide reads , 2013 .

[10]  Wenjie Tang,et al.  ASW: Accelerating Smith–Waterman Algorithm on Coupled CPU–GPU Architecture , 2018, International Journal of Parallel Programming.

[11]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[12]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[13]  Mahmoud Al-Ayyoub,et al.  A hybrid CPU-GPU implementation to accelerate multiple pairwise protein sequence alignment , 2017, 2017 8th International Conference on Information and Communication Systems (ICICS).

[14]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[15]  Christopher D. Manning,et al.  Advances in natural language processing , 2015, Science.

[16]  Natalio Krasnogor,et al.  Measuring the similarity of protein structures by means of the universal similarity metric , 2004, Bioinform..

[17]  Weiguo Liu,et al.  GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment , 2006, HiPC.

[18]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[19]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[20]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[21]  Mahmoud Al-Ayyoub,et al.  Using GPUs to speed-up FCM-based community detection in Social Networks , 2016, 2016 7th International Conference on Computer Science and Information Technology (CSIT).

[22]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[23]  Sanjay V. Rajopadhye,et al.  Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[24]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.

[25]  Che-Lun Hung,et al.  CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs , 2015, Comput. Biol. Chem..

[26]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[27]  Yongdong Zhang,et al.  H‐BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs , 2017, Bioinform..

[28]  Mahmoud Al-Ayyoub,et al.  Accelerating compute-intensive image segmentation algorithms using GPUs , 2017, The Journal of Supercomputing.

[29]  Xin-Min Tian,et al.  Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance , 2002 .

[30]  Anton J. Enright,et al.  GeneRAGE: a robust algorithm for sequence clustering and domain detection , 2000, Bioinform..

[31]  Kenli Li,et al.  Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Xavier Martorell,et al.  CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters , 2016, IEEE Transactions on Parallel and Distributed Systems.

[33]  Fernando Guirado,et al.  Enhancing the Scalability of Consistency-based Progressive Multiple Sequences Alignment Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[34]  Torbjørn Rognes,et al.  Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation , 2011, BMC Bioinformatics.

[35]  Mahmoud Al-Ayyoub,et al.  Improving FCM and T2FCM algorithms performance using GPUs for medical images segmentation , 2015, 2015 6th International Conference on Information and Communication Systems (ICICS).

[36]  Yongchao Liu,et al.  MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[37]  Mahmoud Al-Ayyoub,et al.  Exploiting GPUs to accelerate clustering algorithms , 2016, 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA).

[38]  Yongchao Liu,et al.  GSWABE: faster GPU‐accelerated sequence alignment with optimal alignment retrieval for short DNA sequences , 2015, Concurr. Comput. Pract. Exp..

[39]  Anuj Chaudhary,et al.  A GPU based implementation of Needleman-Wunsch algorithm using skewing transformation , 2015, 2015 Eighth International Conference on Contemporary Computing (IC3).