Approximation Algorithms for Multiple Sequence Alignment Under a Fixed Evolutionary Tree

We consider the problem of aligning sequences related by a given evolutionary tree: given a fixed tree with its leaves labeled with sequences, find ancestral sequences to label the internal nodes so as to minimize the total cost of all the edges in the tree. The cost of an edge is the edit distance between the sequences labeling its endpoints. In this paper, we consider the case when the given tree is a regular d-ary tree for some fixed d and provide a d+1/d−1-approximation algorithm for this problem that runs in time O(d(2kn) d + n2k2d) where k is the number of leaves in the tree and n is the maximum length of any of the sequences labeling the leaves.

[1]  D. Gusfield Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993 .

[2]  R. L. Chambers Technical report 96-4 , 1996 .

[3]  Eugene L. Lawler,et al.  Approximation Algorithms for Multiple Sequence Alignment , 1994, CPM.

[4]  J. Hartigan MINIMUM MUTATION FITS TO A GIVEN TREE , 1973 .

[5]  J. Hein Unified approach to alignment and phylogenies. , 1990, Methods in enzymology.

[6]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[7]  J Hein,et al.  A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. , 1989, Molecular biology and evolution.

[8]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[9]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[10]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[11]  David J. Lipman,et al.  MULTIPLE ALIGNMENT , COMMUNICATION COST , AND GRAPH MATCHING * , 1992 .

[12]  D. Lipman,et al.  Trees, stars, and multiple biological sequence alignment , 1989 .

[13]  R. Ravi,et al.  Approximation Algorithms for Multiple Sequence Alignment Under a Fixed Evolutionary Tree , 1995, CPM.

[14]  D Gusfield,et al.  Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993, Bulletin of mathematical biology.

[15]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[16]  Kurt Mehlhorn,et al.  A branch-and-cut algorithm for multiple sequence alignment , 1997, RECOMB '97.

[17]  M. Waterman,et al.  Line geometries for sequence comparisons , 1984 .

[18]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[19]  Lusheng Wang,et al.  Improved Approximation Algorithms for Tree Alignment , 1996, J. Algorithms.

[20]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[21]  Lusheng Wang,et al.  New uses for uniform lifted alignments , 1998, Mathematical Support for Molecular Biology.

[22]  Tao Jiang,et al.  Aligning sequences via an evolutionary tree: complexity and approximation , 1994, STOC '94.