Getting a tree fast: Neighbor Joining, FastME, and distance-based methods.

Neighbor Joining (NJ), FastME, and other distance-based programs including BIONJ, WEIGHBOR, and (to some extent) FITCH, are fast methods to build phylogenetic trees. This makes them particularly effective for large-scale studies or for bootstrap analysis, which require runs on multiple data sets. Like maximum likelihood methods, distance methods are based on a sequence evolution model that is used to estimate the matrix of pairwise evolutionary distances. Computer simulations indicate that the topological accuracy of FastME is best, followed by FITCH, WEIGHBOR, and BIONJ, while NJ is worse. Moreover, FastME is even faster than NJ with large data sets. Best-distance methods are equivalent to parsimony in most cases, but become more accurate when the molecular clock is strongly violated or in the presence of long (e.g., outgroup) branches. This unit describes how to use distance-based methods, focusing on NJ (the most popular) and FastME (the most efficient today). It also describes how to estimate evolutionary distances from DNA and proteins, how to perform bootstrap analysis, and how to use CLUSTAL to compute both a sequence alignment and a phylogenetic tree.

[1]  M. Kimura The Neutral Theory of Molecular Evolution: Introduction , 1983 .

[2]  Olivier Gascuel,et al.  On the Interpretation of Bootstrap Trees: Appropriate Threshold of Clade Selection and Induced Gain , 1996 .

[3]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[4]  J. Felsenstein An alternating least squares approach to inferring phylogenies from pairwise distances. , 1997, Systematic biology.

[5]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[6]  M. Bulmer Use of the Method of Generalized Least Squares in Reconstructing Phylogenies from Sequence Data , 1991 .

[7]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[8]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[9]  Andrew D. Smith,et al.  A Transition Probability Model for Amino Acid Substitutions from Blocks , 2003, J. Comput. Biol..

[10]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[11]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[12]  L. Jin,et al.  Variances of the average numbers of nucleotide substitutions within and between populations. , 1989, Molecular biology and evolution.

[13]  A. Halpern,et al.  Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. , 2000, Molecular biology and evolution.

[14]  A. Tversky,et al.  Additive similarity trees , 1977 .

[15]  O. Gascuel,et al.  Efficient biased estimation of evolutionary distances when substitution rates vary across sites. , 2002, Molecular biology and evolution.

[16]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[17]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[18]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[19]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[20]  O. Gascuel,et al.  Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. , 2003, Molecular biology and evolution.

[21]  N. Galtier,et al.  Maximum-likelihood phylogenetic analysis under a covarion-like model. , 2001, Molecular biology and evolution.

[22]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[23]  X. Xia,et al.  DAMBE: software package for data analysis in molecular biology and evolution. , 2001, The Journal of heredity.

[24]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[25]  Alex Bateman,et al.  QuickTree: building huge Neighbour-Joining trees of protein sequences , 2002, Bioinform..

[26]  Y. Pauplin Direct Calculation of a Tree Length Using a Distance Matrix , 2000, Journal of Molecular Evolution.

[27]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[28]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, J. Comput. Biol..

[29]  M. Nei,et al.  Theoretical foundation of the minimum-evolution method of phylogenetic inference. , 1993, Molecular biology and evolution.

[30]  M. Gouy,et al.  WWW-query: an on-line retrieval system for biological sequence banks. , 1996, Biochimie.

[31]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.