Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units

Multiple sequence alignments with constraints has become an important problem in computational biology. The concept of constrained sequence alignment is proposed to incorporate the biologist's domain knowledge into sequence alignments such that the user-specified residues/segments are aligned together in the alignment results. Over the past decade, a series of constrained multiple sequence alignment tools were proposed in the literature. RE-MuSiC is the newest tool with the regular expression constraints and useful for a wide range of biological applications. However, the computation time of REMuSiC is large for a large amount of sequences or long sequences and this problem limits the application usage. Therefore, in this paper, a tool, GPU-REMuSiC v1.0, is proposed to reduce the computation time of RE-MuSiC by using the graphics processing units with CUDA. GPU-REMuSiC v1.0 can achieve 29× speedups for overall computation time by the experimental results.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Chin Lung Lu,et al.  A memory-efficient algorithm for multiple sequence alignment with constraints , 2004, Bioinform..

[3]  Srinivas Aluru,et al.  Space and time optimal parallel sequence alignments , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[4]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[5]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.

[6]  Yin-Te Tsai,et al.  The constrained longest common subsequence problem , 2003, Inf. Process. Lett..

[7]  Kuo-Bin Li,et al.  ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..

[8]  Weiguo Liu,et al.  Streaming Algorithms for Biological Sequence Alignment on GPUs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[9]  Yin-Te Tsai,et al.  MuSiC: a tool for multiple sequence alignment with constraints , 2004, Bioinform..

[10]  Edans Flavius de Oliveira Sandes,et al.  Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[11]  Chuan Yi Tang,et al.  Efficient algorithms for regular expression constrained sequence alignment , 2007, Inf. Process. Lett..

[12]  Cole Trapnell,et al.  Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..

[13]  Eric Bach,et al.  Asynchronous Analysis of Parallel Dynamic Programming Algorithms , 1996, IEEE Trans. Parallel Distributed Syst..

[14]  Yongchao Liu,et al.  CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units , 2010, Pattern Recognit. Lett..

[15]  Alfredo De Santis,et al.  A simple algorithm for the constrained sequence problems , 2004, Information Processing Letters.

[16]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[17]  Yin-Te Tsai,et al.  Constrained multiple sequence alignment tool development and its application to RNase family alignment , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[18]  Prudence W. H. Wong,et al.  Efficient constrained multiple sequence alignment with performance guarantee , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[19]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[20]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[21]  Dan He,et al.  A parallel algorithm for the constrained multiple sequence alignment problem , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[22]  Yongchao Liu,et al.  MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[23]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[24]  Ali Akoglu,et al.  Sequence alignment with GPU: Performance and design challenges , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[25]  Chuan Yi Tang,et al.  RE-MuSiC: a tool for multiple sequence alignment with regular expression constraints , 2007, Nucleic Acids Res..

[26]  Weiguo Liu,et al.  CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[28]  Eugene W. Myers,et al.  Progressive multiple alignment with constraints , 1997, RECOMB '97.

[29]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[30]  Dan He,et al.  Space-efficient Parallel Algorithms for the Constrained Multiple Sequence Alignment Problem , 2022 .

[31]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[32]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[33]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[34]  Dan He,et al.  FastPCMSA: An Improved Parallel Algorithm for the Constrained Multiple Sequence Alignment Problem , 2006, FCS.

[35]  Alan C. H. Ling,et al.  A Fast Algorithm for the Constrained Multiple Sequence Alignment Problem , 2006, Acta Cybern..

[36]  Weiguo Liu,et al.  GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment , 2006, HiPC.

[37]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.