GPU-based cloud service for multiple sequence alignments with regular expression constrains

Multiple sequence alignments with constrains has become an important problem in the computational biology. The concept of constrained sequence alignment is proposed to incorporate the biologist's domain knowledge into sequence alignments such that the user-specified residues/segments are aligned together in the alignment results. Over the past decade, a series of constrained multiple sequence alignment tools were proposed in the literature. GPU-REMuSiC is a newest tool with the regular expression constrains and uses the graphics processing units (GPUs) with CUDA. GPU-REMuSiC can achieve 29× speedups for overall computation time by the experimental results. However, the execution environment of GPU-REMuSiC need to build, and it's a threshold for biologists to set up. Therefore, we design an intuitive friendly user interface for the potential cloud server with GPUs. Use the user interface through network, we can send the input data to remote server without cumbersome setting in local host. Finally, we can receive the alignment results from the remote cloud server with GPUs.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Weiguo Liu,et al.  GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment , 2006, HiPC.

[3]  Chuan Yi Tang,et al.  Efficient algorithms for regular expression constrained sequence alignment , 2006, Inf. Process. Lett..

[4]  Chuan Yi Tang,et al.  RE-MuSiC: a tool for multiple sequence alignment with regular expression constraints , 2007, Nucleic Acids Res..

[5]  Hwa-Chun Lin,et al.  Frame arrangement on multiple frequency carriers in TDD based PRMA , 2003, 10th International Conference on Telecommunications, 2003. ICT 2003..

[6]  Weiguo Liu,et al.  Streaming Algorithms for Biological Sequence Alignment on GPUs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[7]  Yongchao Liu,et al.  CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units , 2010, Pattern Recognit. Lett..

[8]  Dan He,et al.  FastPCMSA: An Improved Parallel Algorithm for the Constrained Multiple Sequence Alignment Problem , 2006, FCS.

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Prudence W. H. Wong,et al.  Efficient constrained multiple sequence alignment with performance guarantee , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[11]  Chin Lung Lu,et al.  A memory-efficient algorithm for multiple sequence alignment with constraints , 2004, Bioinform..

[12]  Chun-Yuan Lin,et al.  Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units , 2014, Int. J. Comput. Sci. Eng..

[13]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[14]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[15]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[16]  Dan He,et al.  Space-efficient Parallel Algorithms for the Constrained Multiple Sequence Alignment Problem , 2022 .

[17]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[18]  Dan He,et al.  A parallel algorithm for the constrained multiple sequence alignment problem , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[19]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[20]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[21]  Ali Akoglu,et al.  Sequence alignment with GPU: Performance and design challenges , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[22]  Yin-Te Tsai,et al.  The constrained longest common subsequence problem , 2003, Inf. Process. Lett..

[23]  Alan C. H. Ling,et al.  A Fast Algorithm for the Constrained Multiple Sequence Alignment Problem , 2006, Acta Cybern..

[24]  Cole Trapnell,et al.  Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..

[25]  Eric Bach,et al.  Asynchronous Analysis of Parallel Dynamic Programming Algorithms , 1996, IEEE Trans. Parallel Distributed Syst..

[26]  Yongchao Liu,et al.  MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[27]  Yin-Te Tsai,et al.  MuSiC: a tool for multiple sequence alignment with constraints , 2004, Bioinform..

[28]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[29]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[30]  Weiguo Liu,et al.  CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[32]  Eugene W. Myers,et al.  Progressive multiple alignment with constraints , 1997, RECOMB '97.

[33]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[34]  Alfredo De Santis,et al.  A simple algorithm for the constrained sequence problems , 2004, Information Processing Letters.

[35]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[36]  Yin-Te Tsai,et al.  Constrained multiple sequence alignment tool development and its application to RNase family alignment , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[37]  Srinivas Aluru,et al.  Space and time optimal parallel sequence alignments , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[38]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.