CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs

For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11.

[1]  Edans Flavius de Oliveira Sandes,et al.  Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[2]  Sheng-Ta Lee,et al.  GPU-Based Cloud Service for Smith-Waterman Algorithm Using Frequency Distance Filtration Scheme , 2013, BioMed research international.

[3]  Cole Trapnell,et al.  Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..

[4]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.

[5]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[6]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[7]  Chun-Yuan Lin,et al.  Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units , 2014, Int. J. Comput. Sci. Eng..

[8]  Srinivas Aluru,et al.  Space and time optimal parallel sequence alignments , 2004, IEEE Transactions on Parallel and Distributed Systems.

[9]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[10]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[11]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[12]  Zhaolei Zhang,et al.  SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences , 2014, Bioinform..

[13]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[14]  Zhaolei Zhang,et al.  DNA motif elucidation using belief propagation , 2013, Nucleic acids research.

[15]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[16]  Jacek Blazewicz,et al.  G-MSA - A GPU-based, fast and accurate algorithm for multiple sequence alignment , 2013, J. Parallel Distributed Comput..

[17]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[18]  Weiguo Liu,et al.  CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[20]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[21]  Yongchao Liu,et al.  MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[22]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[23]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[24]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[25]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[26]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[27]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[28]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[29]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[30]  Richard E. Korf,et al.  Divide-and-Conquer Frontier Search Applied to Optimal Sequence Alignment , 2000, AAAI/IAAI.

[31]  Yongchao Liu,et al.  CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units , 2010, Pattern Recognit. Lett..

[32]  Kuo-Bin Li,et al.  ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..