Phylogenetic Tree Construction Using Markov Chain Monte Carlo

Abstract We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the observed sequences. Our algorithm strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas of high probability. Because phylogenetic information is described by a tree, we have created new diagnostics to handle this type of data structure. An important byproduct of the Markov chain Monte Carlo phylogeny building technique is that it provides estimates and corresponding measures of variability for any aspect of the phylogeny under study.

[1]  Result of control of dysentery in one of the industrial centers in south of the Kirghiz Republic , 1954 .

[2]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[3]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[4]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[5]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[6]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[7]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[8]  S. Jeffery Evolution of Protein Molecules , 1979 .

[9]  J. Polhemus,et al.  Analysis of Taxonomic Congruence among Morphological, Ecological, and Biogeographic Data Sets for the Leptopodomorpha (Hemiptera) , 1980 .

[10]  F. James Rohlf,et al.  Taxonomic Congruence in the Leptopodomorpha Re-examined , 1981 .

[11]  J. Friedman,et al.  Graphics for the Multivariate Two-Sample Problem , 1981 .

[12]  Fred R. McMorris,et al.  Consensusn-trees , 1981 .

[13]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[14]  H. Kishino,et al.  A New Molecular Clock of Mitochondrial DNA and the Evolution of Hominoids , 1984 .

[15]  E. Nummelin General irreducible Markov chains and non-negative operators: List of symbols and notation , 1984 .

[16]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[17]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[18]  J A Lake,et al.  A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. , 1987, Molecular biology and evolution.

[19]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[20]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[21]  I. Eramova,et al.  Outbreak of hospital infection caused by human immunodeficiency virus (HIV) in Elista. , 1990 .

[22]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[23]  P. Legendre,et al.  A statistical framework to test the consensus of two nested classifications , 1990 .

[24]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[25]  D. Mindell Fundamentals of molecular evolution , 1991 .

[26]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[27]  A. Zharkikh,et al.  Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. , 1992, Molecular biology and evolution.

[28]  J. Bull,et al.  Experimental phylogenetics: generation of a known phylogeny. , 1992, Science.

[29]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[30]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[31]  M M Miyamoto,et al.  Analysis of DNA sequence data: phylogenetic inference. , 1993, Methods in enzymology.

[32]  P. Kaleebu,et al.  Molecular epidemiology of HIV‐1 in the former Soviet Union: analysis of env V3 sequences and their correlation with epidemiologic data , 1994, AIDS.

[33]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[34]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[35]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[36]  Bin Yu,et al.  Regeneration in Markov chain samplers , 1995 .

[37]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[38]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[39]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[40]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[41]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[43]  R. Bartoszynski,et al.  Reducing multidimensional two-sample data to one-dimensional interpoint comparisons , 1996 .

[44]  Douglas E. Critchlow,et al.  THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES , 1996 .

[45]  D. Aldous PROBABILITY DISTRIBUTIONS ON CLADOGRAMS , 1996 .

[46]  K. Athreya,et al.  ON THE CONVERGENCE OF THE MARKOV CHAIN SIMULATION METHOD , 1996 .

[47]  Joseph T. Chang,et al.  Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. , 1996, Mathematical biosciences.

[48]  Bayesian nonparametric estimation via Gibbs sampling for coherent systems with redundancy , 1997 .

[49]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[50]  K. Lange Reconstruction of Evolutionary Trees , 1997 .

[51]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[52]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[53]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[54]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[55]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[56]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[57]  T. Fearn The Jackknife , 2000 .

[58]  D. Pearl,et al.  Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. , 2001, Systematic biology.

[59]  J. Cracraft,et al.  Parsimony and Phylogenetic Inference Using DNA Sequences : Some Methodological Strategies , 2022 .