A rapid heuristic algorithm for finding minimum evolution trees.

The minimum sum of branch lengths (S), or the minimum evolution (ME) principle, has been shown to be a good optimization criterion in phylogenetic inference. Unfortunately, the number of topologies to be analyzed is computationally prohibitive when a large number of taxa are involved. Therefore, simplified, heuristic methods, such as the neighbor-joining (NJ) method, are usually employed instead. The NJ method analyzes only a small number of trees (compared with the size of the entire search space); so, the tree obtained may not be the ME tree (for which the S value is minimum over the entire search space). Different compromises between very restrictive and exhaustive search spaces have been proposed recently. In particular, the "stepwise algorithm" (SA) utilizes what is known in computer science as the "beam search," whereas the NJ method employs a "greedy search." SA is virtually guaranteed to find the ME trees while being much faster than exhaustive search algorithms. In this study we propose an even faster method for finding the ME tree. The new algorithm adjusts its search exhaustiveness (from greedy to complete) according to the statistical reliability of the tree node being reconstructed. It is also virtually guaranteed to find the ME tree. The performances and computational efficiencies of ME, SA, NJ, and our new method were compared in extensive simulation studies. The new algorithm was found to perform practically as well as the SA (and, therefore, ME) methods and slightly better than the NJ method. For searching for the globally optimal ME tree, the new algorithm is significantly faster than existing ones, thus making it relatively practical for obtaining all trees with an S value equal to or smaller than that of the NJ tree, even when a large number of taxa is involved.

[1]  M. Nei,et al.  The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[3]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[4]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[5]  Sudhir Kumar,et al.  A stepwise algorithm for finding minimum evolution trees. , 1996, Molecular biology and evolution.

[6]  A. Zharkikh,et al.  Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. , 1995, Molecular phylogenetics and evolution.

[7]  M. Nei,et al.  Theoretical foundation of the minimum-evolution method of phylogenetic inference. , 1993, Molecular biology and evolution.

[8]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[9]  A. Zharkikh,et al.  Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. , 1992, Molecular biology and evolution.

[10]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[11]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[12]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[13]  M. Nei,et al.  Phylogenetic analysis in molecular evolutionary genetics. , 1996, Annual review of genetics.

[14]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[15]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[16]  H. Munro,et al.  Mammalian protein metabolism , 1964 .