Better methods for solving parsimony and compatibility

Evolutionary tree reconstruction is a challenging problem with important applications in biology and linguistics. In biology, one of the most promising approaches to tree reconstruction is to find the "maximum parsimony" tree, while in linguistics, the use of the "maximum compatibility" method has been very useful. However, these problems are NP-hard, and current approaches to solving these problems amount to heuristic searches through the space of possible tree topologies (a search which can, on large trees, take months to complete). In this paper, we present a new technique, Optimal Tree Refinement, for reconstructing very large trees. Our technique is motivated by recent experimental studies which have shown that certain polynomial time methods often return contractions of the true tree. We study the use of this technique in solving maximum parsimony and maximum compatibility, and present both hardness results and polynomial time algorithms.

[1]  Sampath Kannan,et al.  Inferring Evolutionary History from DNA Sequences , 1994, SIAM J. Comput..

[2]  David Fernández-Baca,et al.  A Polynomial-Time Algorithm for the Perfect Phylogeny Problem when the Number of Character States is Fixed , 1994 .

[3]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[4]  Carsten Lund,et al.  Proof verification and the intractability of approximation problems , 1992, FOCS 1992.

[5]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[6]  Joseph T. Chang,et al.  Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. , 1996, Mathematical biosciences.

[7]  Tandy J. Warnow,et al.  Parsimony is Hard to Beat , 1997, COCOON.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Andris Ambainis,et al.  Nearly tight bounds on the learnability of evolution , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[10]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[11]  Mihir Bellare,et al.  Improved non-approximability results , 1994, STOC '94.

[12]  Sampath KannanyNovember Eecient Algorithms for Inverting Evolution , 1995 .

[13]  Cynthia A. Phillips,et al.  Constructing evolutionary trees in the presence of polymorphic characters , 1996, STOC '96.

[14]  J. Felsenstein Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[15]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Algorithms of Phylogeny Recronstruction , 1997, COCOON.

[16]  D. Hillis Inferring complex phylogenies. , 1996, Nature.

[17]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[18]  Mihalis Yannakakis,et al.  On limited nondeterminism and the complexity of the V-C dimension , 1993, [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference.

[19]  Tandy J. Warnow,et al.  Reconstructing the evolutionary history of natural languages , 1996, SODA '96.

[20]  Junhyong Kim,et al.  GENERAL INCONSISTENCY CONDITIONS FOR MAXIMUM PARSIMONY: EFFECTS OF BRANCH LENGTHS AND INCREASING NUMBERS OF TAXA , 1996 .

[21]  Lusheng Wang,et al.  Improved Approximation Algorithms for Tree Alignment , 1996, J. Algorithms.

[22]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .

[23]  W. H. Day Computationally difficult parsimony problems in phylogenetic systematics , 1983 .

[24]  László A. Székely,et al.  Reconstructing Trees When Sequence Sites Evolve at Variable Rates , 1994, J. Comput. Biol..

[25]  D. Hillis Inferring complex phytogenies , 1996, Nature.

[26]  Michael R. Fellows,et al.  Two Strikes Against Perfect Phylogeny , 1992, ICALP.

[27]  M Steel,et al.  Links between maximum likelihood and maximum parsimony under a simple model of site substitution. , 1997, Bulletin of mathematical biology.

[28]  David Fernández-Baca,et al.  Simple Algorithms for Perfect Phylogeny and Triangulating Colored Graphs , 1996, Int. J. Found. Comput. Sci..

[29]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[30]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[31]  Daniel H. Huson,et al.  Hybrid tree reconstruction methods , 1999, JEAL.

[32]  T. Warnow Mathematical approaches to comparative linguistics. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Sampath Kannan,et al.  Efficient algorithms for inverting evolution , 1999, JACM.

[34]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[35]  David Fernández-Baca,et al.  A Polynomial-Time Algorithm for the Perfect Phylogeny Problem when the Number of Character States is Fixed , 1993, FOCS.

[36]  Tandy Warnow,et al.  Constructing phylogenetic trees efficiently using compatibility criteria , 1993 .

[37]  Michael R. Fellows,et al.  An Improved Fixed-Parameter Algorithm for Vertex Cover , 1998, Inf. Process. Lett..

[38]  Tandy J. Warnow,et al.  Constructing Big Trees from Short Sequences , 1997, ICALP.

[39]  J. Hartigan MINIMUM MUTATION FITS TO A GIVEN TREE , 1973 .

[40]  Fred R. McMorris,et al.  Triangulating vertex colored graphs , 1994, SODA '93.

[41]  A. A. Bertossi,et al.  The Disk-covering Method for Tree Reconstruction , 1998 .

[42]  A. Rossier Letter to the Editor , 1986, Paraplegia.

[43]  M. Donoghue,et al.  Analyzing large data sets: rbcL 500 revisited. , 1997, Systematic biology.