The Minimum-Evolution Distance-Based Approach to Phylogeny Inference

Distance algorithms remain among the most popular for reconstructing phylogenies, especially for researchers faced with data sets with large num- bers of taxa. Distance algorithms are much faster in practice than character or likelihood algorithms, and least-squares algorithms produce trees that have several desirable statistical properties. The fast Neighbor Joining heuristic has proven to be quite popular with researchers, but suffers some- what from a lack of a statistical foundation. We show here that the balanced minimum evolution approach provides a robust statistical justification and is amenable to fast heuristics that provide topologies superior among the class of distance algorithms. The aim of this chapter is to present a compre- hensive survey of the minimum evolution principle, detailing its variants, algorithms, and statistical and combinatorial properties. The focus is on the balanced version of this principle, as it appears quite well suited for phylogenetic inference, from a theoretical perspective as well as through computer simulations.

[1]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[2]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[3]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[4]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[5]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[6]  E. Harding The probabilities of rooted tree-shapes generated by random bifurcation , 1971, Advances in Applied Probability.

[7]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[8]  K. Kidd,et al.  Phylogenetic analysis: concepts and methods. , 1971, American journal of human genetics.

[9]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[10]  A. Tversky,et al.  Additive similarity trees , 1977 .

[11]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[12]  S. Jeffery Evolution of Protein Molecules , 1979 .

[13]  Joseph Felsenstein,et al.  DISTANCE METHODS FOR INFERRING PHYLOGENIES: A JUSTIFICATION , 1984, Evolution; international journal of organic evolution.

[14]  J. Stephens,et al.  Methods for computing the standard errors of branching points in an evolutionary tree and their application to molecular data from humans and apes. , 1985, Molecular biology and evolution.

[15]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[16]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[17]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[18]  L. Jin,et al.  Variances of the average numbers of nucleotide substitutions within and between populations. , 1989, Molecular biology and evolution.

[19]  W. Vach Least squares approximation of addititve trees , 1989 .

[20]  Alain Guénoche,et al.  Trees and proximity representations , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[21]  M. Bulmer Use of the Method of Generalized Least Squares in Reconstructing Phylogenies from Sequence Data , 1991 .

[22]  A. Dress,et al.  Split decomposition: a new and useful approach to phylogenetic analysis of distance data. , 1992, Molecular phylogenetics and evolution.

[23]  M. Nei,et al.  Theoretical foundation of the minimum-evolution method of phylogenetic inference. , 1993, Molecular biology and evolution.

[24]  O. Gascuel A note on Sattath and Tversky's, Saitou and Nei's, and Studier and Keppler's algorithms for inferring phylogenies from evolutionary distances. , 1994, Molecular biology and evolution.

[25]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[26]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[27]  L. Hubert,et al.  Iterative projection strategies for the least-squares fitting of tree structures to proximity data , 1995 .

[28]  Olivier Gascuel,et al.  Concerning the NJ algorithm and its unweighted version, UNJ , 1996, Mathematical Hierarchies and Biology.

[29]  O. Gascuel,et al.  A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance , 1996 .

[30]  Sudhir Kumar,et al.  A stepwise algorithm for finding minimum evolution trees. , 1996, Molecular biology and evolution.

[31]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[32]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[33]  J. Felsenstein An alternating least squares approach to inferring phylogenies from pairwise distances. , 1997, Systematic biology.

[34]  P. Waddell,et al.  Rapid Evaluation of Least-Squares and Minimum-Evolution Criteria on Phylogenetic Trees , 1998 .

[35]  Vladimir Makarenkov,et al.  An Algorithm for the Fitting of a Tree Metric According to a Weighted Least-Squares Criterion , 1999 .

[36]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[37]  Olivier Gascuel,et al.  Data Model and Classification by Trees: The Minimum Variance Reduction (MVR) Method , 2000, J. Classif..

[38]  O. Gascuel On the optimization principle in phylogenetic analysis and the minimum-evolution criterion. , 2000, Molecular biology and evolution.

[39]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[40]  Y. Pauplin Direct Calculation of a Tree Length Using a Distance Matrix , 2000, Journal of Molecular Evolution.

[41]  A. Halpern,et al.  Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. , 2000, Molecular biology and evolution.

[42]  O Gascuel,et al.  Strengths and limitations of the minimum evolution principle. , 2001, Systematic biology.

[43]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[44]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, WABI.

[45]  Richard Desper Tree Fitting: Topological Recognition from Ordinary Least-Squares Edge Length Estimates , 2002, J. Classif..

[46]  E. Susko Confidence regions and hypothesis tests for topologies using generalized least squares. , 2003, Molecular biology and evolution.

[47]  Charles Semple,et al.  Cyclic permutations and evolutionary trees , 2004, Adv. Appl. Math..

[48]  O. Gascuel,et al.  Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. , 2003, Molecular biology and evolution.

[49]  Olivier Gascuel,et al.  Performance Analysis of Hierarchical Clustering Algorithms , 2004, J. Classif..

[50]  Sampath Kannan,et al.  A robust model for finding optimal evolutionary trees , 1993, Algorithmica.