Stochastic search strategy for estimation of maximum likelihood phylogenetic trees.

The maximum likelihood (ML) method of phylogenetic tree construction is not as widely used as other tree construction methods (e.g., parsimony, neighbor-joining) because of the prohibitive amount of time required to find the ML tree when the number of sequences under consideration is large. To overcome this difficulty, we propose a stochastic search strategy for estimation of the ML tree that is based on a simulated annealing algorithm. The algorithm works by moving through tree space by way of a "local rearrangement" strategy so that topologies that improve the likelihood are always accepted, whereas those that decrease the likelihood are accepted with a probability that is related to the proportionate decrease in likelihood. Besides greatly reducing the time required to estimate the ML tree, the stochastic search strategy is less likely to become trapped in local optima than are existing algorithms for ML tree estimation. We demonstrate the success of the modified simulated annealing algorithm by comparing it with two existing algorithms (Swofford's PAUP* and Felsenstein's DNAMLK) for several theoretical and real data examples.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[3]  S. Jeffery Evolution of Protein Molecules , 1979 .

[4]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[5]  D. Mitra,et al.  Convergence and finite-time behavior of simulated annealing , 1985, 1985 24th IEEE Conference on Decision and Control.

[6]  Emile H. L. Aarts,et al.  A new polynomial time cooling schedule , 1985 .

[7]  M. Lundy Applications of the annealing algorithm to combinatorial problems in statistics , 1985 .

[8]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[9]  D. Mitra,et al.  Convergence and finite-time behavior of simulated annealing , 1986, Advances in Applied Probability.

[10]  Alistair I. Mees,et al.  Convergence of an annealing algorithm , 1986, Math. Program..

[11]  A. Dress,et al.  Parsimonious phylogenetic trees in metric spaces and simulated annealing , 1987 .

[12]  T Gojobori,et al.  Molecular phylogeny and evolution of primate mitochondrial DNA. , 1988, Molecular biology and evolution.

[13]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[14]  H. Haario,et al.  Simulated annealing process in general state space , 1991, Advances in Applied Probability.

[15]  H. Bernard,et al.  Phylogenetic analysis of 48 papillomavirus types and 28 subtypes and variants: a showcase for the molecular evolution of DNA viruses , 1992, Journal of virology.

[16]  Prem K. Goel,et al.  A stochastic probing algorithm for global optimization , 1992, J. Glob. Optim..

[17]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[18]  Ziheng Yang Statistical Properties of the Maximum Likelihood Method of Phylogenetic Estimation and Comparison With Distance Matrix Methods , 1994 .

[19]  A. Halpern,et al.  Analysis of genomic sequences of 95 papillomavirus types: uniting typing, phylogeny, and taxonomy , 1995, Journal of virology.

[20]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[21]  H Matsuda,et al.  Protein phylogenetic inference using maximum likelihood with a genetic algorithm. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[22]  D. Aldous PROBABILITY DISTRIBUTIONS ON CLADOGRAMS , 1996 .

[23]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[24]  D. Barker LVB 1.0: Reconstructing Evolution with Parsimony and Simulated Annealing , 1997 .

[25]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[26]  A. Rambaut,et al.  Elucidating the Population Histories and Transmission Dynamics of Papillomaviruses Using Phylogenetic Trees , 1997, Journal of Molecular Evolution.

[27]  J. S. Rogers,et al.  On the consistency of maximum likelihood estimation of phylogenetic trees from nucleotide sequences. , 1997, Systematic biology.

[28]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[29]  Simulation-based estimation of phylogenetic trees / , 1999 .

[30]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[31]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[32]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[33]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .