论文信息 - A memory-efficient algorithm for multiple sequence alignment with constraints

A memory-efficient algorithm for multiple sequence alignment with constraints

Abstract Motivation: Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in 𝒪(γn2) memory, where γ is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences~only. Results: We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only 𝒪(αn) space, where α is the sum of the lengths of constraints and usually α ≪ n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. Availability: http://genome.life.nctu.edu.tw/MUSICME Contact: cllu@mail.nctu.edu.tw

Chin Lung Lu | Yen Pin Huang | C. Lu | Yen Pin Huang

[1] Eugene L. Lawler,et al. Approximation Algorithms for Multiple Sequence Alignment , 1994, Theor. Comput. Sci..

[2] D. Higgins,et al. T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[3] Burkhard Morgenstern,et al. DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[4] William R. Taylor,et al. Multiple sequence alignment by a pairwise algorithm , 1987, Comput. Appl. Biosci..

[5] Ernest Feytmans,et al. MATCH-BOX: a fundamentally new algorithm for the simultaneous alignment of several protein sequences , 1992, Comput. Appl. Biosci..

[6] Luc Jaeger,et al. RNA pseudoknots , 1992, Current Biology.

[7] Kun-Mao Chao,et al. Recent Developments in Linear-Space Alignment Methods: A Survey , 1994, J. Comput. Biol..

[8] D Gusfield,et al. Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993, Bulletin of mathematical biology.

[9] John D. Kececioglu,et al. The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[10] D. Brian,et al. A Phylogenetically Conserved Hairpin-Type 3′ Untranslated Region Pseudoknot Functions in Coronavirus RNA Replication , 1999, Journal of Virology.

[11] D. Lipman,et al. The multiple sequence alignment problem in biology , 1988 .

[12] D. Higgins,et al. See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[13] Daniel S. Hirschberg,et al. A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[14] Knut Reinert,et al. The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment , 2000, J. Comput. Biol..

[15] A. K. Wong,et al. A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[16] Kobayashi,et al. Improvement of the A(*) Algorithm for Multiple Sequence Alignment. , 1998, Genome informatics. Workshop on Genome Informatics.

[17] Eugene W. Myers,et al. Progressive multiple alignment with constraints , 1997, RECOMB '97.

[18] Yin-Te Tsai,et al. Constrained multiple sequence alignment tool development and its application to RNase family alignment , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[19] J Stoye,et al. A general method for fast multiple sequence alignment. , 1996, Gene.

[20] Andreas Premstaller,et al. Genotyping of Snps in a polyploid genome by pyrosequencing. , 2002, BioTechniques.

[21] Jens Stoye,et al. DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment , 1997, Comput. Appl. Biosci..

[22] M. A. McClure,et al. Comparative analysis of multiple protein-sequence alignment methods. , 1994, Molecular biology and evolution.

[23] R. Doolittle,et al. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[24] J. Thompson,et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[25] Eugene W. Myers,et al. Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[26] William R. Taylor,et al. Motif-Biased Protein Sequence Alignment , 1994, J. Comput. Biol..

[27] Bin Ma,et al. Near optimal multiple alignment within a band in polynomial time , 2000, STOC '00.

[28] C. Pleij,et al. Pseudoknots: A Vital Feature in Viral RNA , 1997 .

[29] Hiroshi Imai,et al. Fast A Algorithms for Multiple Sequence Alignment , 1994 .

[30] J. Thompson,et al. DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. , 2000, Nucleic acids research.

[31] Yin-Te Tsai,et al. MuSiC: a tool for multiple sequence alignment with constraints , 2004, Bioinform..

[32] P. Pevzner. Multiple alignment, communication cost, and graph matching , 1992 .

[33] Paola Bonizzoni,et al. The complexity of multiple sequence alignment with SP-score that is a metric , 2001, Theor. Comput. Sci..

[34] Tao Jiang,et al. On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[35] David J. Lipman,et al. MULTIPLE ALIGNMENT , COMMUNICATION COST , AND GRAPH MATCHING * , 1992 .

[36] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[37] Prudence W. H. Wong,et al. Efficient constrained multiple sequence alignment with performance guarantee , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[38] G D Schuler,et al. A workbench for multiple alignment construction and analysis , 1991, Proteins.

[39] F. Corpet. Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[40] J. Stoye. Multiple sequence alignment with the Divide-and-Conquer method. , 1998, Gene.

[41] C. Notredame,et al. Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[42] Jens Stoye,et al. Divide-and-conquer multiple alignment with segment-based constraints , 2003, ECCB.

[43] Hiroshi Imai,et al. Enhanced A* Algorithms for Multiple Alignments: Optimal Alignments for Several Sequences and k-Opt Approximate Alignments for Large Cases , 1999, Theoretical Computer Science.

[44] Jens Stoye,et al. Improving the Divide-and-Conquer Approach to Sum-of-Pairs Multiple Sequence Alignment , 1997 .

[45] Hugh B Nicholas,et al. Strategies for multiple sequence alignment. , 2002, BioTechniques.