Phylogenetic tree construction using sequential stochastic approximation Monte Carlo

Monte Carlo methods have received much attention recently in the literature of phylogenetic tree construction. However, they often suffer from two difficulties, the curse of dimensionality and the local-trap problem. The former one is due to that the number of possible phylogenetic trees increases at a super-exponential rate as the number of taxa increases. The latter one is due to that the phylogenetic tree has often a rugged energy landscape. In this paper, we propose a new phylogenetic tree construction method, which attempts to alleviate these two difficulties simultaneously by making use of the sequential structure of phylogenetic trees in conjunction with stochastic approximation Monte Carlo (SAMC) simulations. The use of the sequential structure of the problem provides substantial help to reduce the curse of dimensionality in simulations, and SAMC effectively prevents the system from getting trapped in local energy minima. The new method is compared with a variety of existing Bayesian and non-Bayesian methods on simulated and real datasets. Numerical results are in favor of the new method in terms of quality of the resulting phylogenetic trees.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  J. Besag,et al.  Spatial Statistics and Bayesian Computation , 1993 .

[3]  Faming Liang Use of sequential structure in simulation from high-dimensional systems. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[5]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[6]  J. Felsenstein,et al.  PHYLIP: phylogenetic inference package version 3.5c. Distributed over the Internet , 1993 .

[7]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[8]  P. Grassberger Pruned-enriched Rosenbluth method: Simulations of θ polymers of chain length up to 1 000 000 , 1997 .

[9]  J. R. Stauffer,et al.  Evolution of NADH dehydrogenase subunit 2 in east African cichlid fish. , 1995, Molecular phylogenetics and evolution.

[10]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[11]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[12]  D. Pearl,et al.  Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. , 2001, Systematic biology.

[13]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[14]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[15]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[16]  D. Maddison The discovery and importance of multiple islands of most , 1991 .

[17]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[18]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[19]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[20]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[21]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[23]  K. Kidd,et al.  Phylogenetic analysis: concepts and methods. , 1971, American journal of human genetics.

[24]  Francesca Chiaromonte,et al.  Scoring Pairwise Genomic Sequence Alignments , 2001, Pacific Symposium on Biocomputing.

[25]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[26]  A. Kong,et al.  Sequential imputation for multilocus linkage analysis. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[27]  M. Nei,et al.  A Simple Method for Estimating and Testing Minimum-Evolution Trees , 1992 .

[28]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[29]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[30]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[31]  Bob Mau,et al.  Markov chain Monte Carlo for the Bayesian analysis of evolutionary trees from aligned molecular sequences , 1999 .

[32]  M. Holder,et al.  Phylogeny estimation: traditional and Bayesian approaches , 2003, Nature Reviews Genetics.

[33]  Statistics in molecular biology and genetics : selected proceedings of a 1997 joint AMS-IMS-SIAM Summer conference on statistics in molecular biology , 1999 .

[34]  R. Carroll,et al.  Stochastic Approximation in Monte Carlo Computation , 2007 .

[35]  N. Chopin A sequential particle filter method for static models , 2002 .

[36]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[37]  W H Wong,et al.  Dynamic weighting in Monte Carlo and optimization. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[38]  K. Strimmer,et al.  Bayesian Probabilities and Quartet Puzzling , 1997 .

[39]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[40]  H. Kishino,et al.  Maximum likelihood inference of protein phylogeny and the origin of chloroplasts , 1990, Journal of Molecular Evolution.

[41]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[42]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[43]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[44]  Jun S. Liu,et al.  The Multiple-Try Method and Local Optimization in Metropolis Sampling , 2000 .

[45]  M. Newton,et al.  Phylogenetic Inference for Binary Data on Dendograms Using Markov Chain Monte Carlo , 1997 .

[46]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[47]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.