Polyhedral Geometry of Phylogenetic Rogue Taxa

It is well known among phylogeneticists that adding an extra taxon (e.g. species) to a data set can alter the structure of the optimal phylogenetic tree in surprising ways. However, little is known about this “rogue taxon” effect. In this paper we characterize the behavior of balanced minimum evolution (BME) phylogenetics on data sets of this type using tools from polyhedral geometry. First we show that for any distance matrix there exist distances to a “rogue taxon” such that the BME-optimal tree for the data set with the new taxon does not contain any nontrivial splits (bipartitions) of the optimal tree for the original data. Second, we prove a theorem which restricts the topology of BME-optimal trees for data sets of this type, thus showing that a rogue taxon cannot have an arbitrary effect on the optimal tree. Third, we computationally construct polyhedral cones that give complete answers for BME rogue taxon behavior when our original data fits a tree on four, five, and six taxa. We use these cones to derive sufficient conditions for rogue taxon behavior for four taxa, and to understand the frequency of the rogue taxon effect via simulation.

[1]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[2]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[3]  K. Brown,et al.  Graduate Texts in Mathematics , 1982 .

[4]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[5]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[6]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[7]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[8]  G. Ziegler Lectures on Polytopes , 1994 .

[9]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[10]  M. Ziegler Volume 152 of Graduate Texts in Mathematics , 1995 .

[11]  D. Hillis Inferring complex phytogenies , 1996, Nature.

[12]  G. Ewald Combinatorial Convexity and Algebraic Geometry , 1996 .

[13]  D. Hillis Inferring complex phylogenies. , 1996, Nature.

[14]  Junhyong Kim,et al.  GENERAL INCONSISTENCY CONDITIONS FOR MAXIMUM PARSIMONY: EFFECTS OF BRANCH LENGTHS AND INCREASING NUMBERS OF TAXA , 1996 .

[15]  S. Poe Sensitivity of phylogeny estimation to taxonomic sampling. , 1998, Systematic biology.

[16]  A. Graybeal,et al.  Is it better to add taxa or characters to a difficult phylogenetic problem? , 1998, Systematic biology.

[17]  B. Rannala,et al.  Taxon sampling and the accuracy of large phylogenies. , 1998, Systematic biology.

[18]  Michael Joswig,et al.  polymake: a Framework for Analyzing Convex Polytopes , 2000 .

[19]  Ming Li,et al.  Computing the quartet distance between evolutionary trees , 2000, SODA '00.

[20]  Sudhir Kumar,et al.  Incomplete taxon sampling is not a problem for phylogenetic inference , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Derrick J. Zwickl,et al.  Increased taxon sampling greatly reduces phylogenetic error. , 2002, Systematic biology.

[22]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, WABI.

[23]  Derrick J. Zwickl,et al.  Increased taxon sampling is advantageous for phylogenetic inference. , 2002, Systematic biology.

[24]  Derrick J. Zwickl,et al.  Is sparse taxon sampling a problem for phylogenetic inference? , 2003, Systematic biology.

[25]  Brian Gough,et al.  GNU Scientific Library Reference Manual - Third Edition , 2003 .

[26]  Christian N. S. Pedersen,et al.  Computing the Quartet Distance between Evolutionary Trees in Time O(n log n) , 2001, Algorithmica.

[27]  Sudhir Kumar,et al.  Taxon sampling, bioinformatics, and phylogenomics. , 2003, Systematic biology.

[28]  S. Poe Evaluation of the strategy of long-branch subdivision to improve the accuracy of phylogenetic methods. , 2003, Systematic biology.

[29]  David L. Swofford,et al.  Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics , 1997, Journal of Mammalian Evolution.

[30]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[31]  D. Penny Inferring Phylogenies.—Joseph Felsenstein. 2003. Sinauer Associates, Sunderland, Massachusetts. , 2004 .

[32]  Thomas Mailund,et al.  QDist-quartet distance between evolutionary trees , 2004, Bioinform..

[33]  O. Gascuel,et al.  Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. , 2003, Molecular biology and evolution.

[34]  O. Gascuel,et al.  The Minimum-Evolution Distance-Based Approach to Phylogeny Inference , 2005 .

[35]  R. Debry The systematic component of phylogenetic error as a function of taxonomic sampling under parsimony. , 2005, Systematic biology.

[36]  L. Pachter,et al.  Algebraic Statistics for Computational Biology: Preface , 2005 .

[37]  D. Hillis,et al.  Resolution of phylogenetic conflict in large data sets by increased taxon sampling. , 2006, Systematic biology.

[38]  Hervé Philippe,et al.  Lack of resolution in the animal phylogeny: closely spaced cladogeneses or undetected systematic errors? , 2007, Molecular biology and evolution.

[39]  Olivier Gascuel,et al.  The minimum evolution distance-based approach of phylogenetic inference , 2007, Mathematics of Evolution and Phylogeny.

[40]  Lior Pachter,et al.  On the optimality of the neighbor-joining algorithm , 2007, Algorithms for Molecular Biology.

[41]  S. Tringe,et al.  Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments , 2007, Science.

[42]  D. Hillis,et al.  Taxon sampling and the accuracy of phylogenetic analyses , 2008 .

[43]  Junhyong Kim,et al.  Taxon sampling affects inferences of macroevolutionary processes from phylogenetic trees. , 2008, Systematic biology.

[44]  S. A. Berger,et al.  Evolutionary Placement of Short Sequence Reads , 2009, 0911.2852.

[45]  O. Gascuel,et al.  Consistency of Topological Moves Based on the Balanced Minimum Evolution Principle of Phylogenetic Inference , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.