Investigating Convergence of Markov Chain Monte Carlo Methods for Bayesian Phylogenetic Inference

In biology, it is commonly of interest to investigate the evolutionary pattern that gave rise to an existing group of individuals, such as species or genes. This pattern is most often represented pictorially by a phylogenetic tree. Many methods of inferring evolutionary patterns have been proposed, but as advances in computational capabilities have made Bayesian inference more approachable, it has become an increasingly popular technique for phylogenetic inference. In Bayesian inference, it is often the case that the posterior density cannot be written out in its entirety due to the intractability of the normalizing constant. One way of working around this is to use a Markov chain Monte Carlo (MCMC) method. The idea is that after several (possibly many) iterations, the chain has approximately converged to its stationary distribution, namely, the posterior distribution. After these initial iterations, subsequent steps of the chain represent an approximate sample from the posterior distribution, thus enabling Bayesian inference. The biggest question one faces when using MCMC methods is the question of how long the chain should be run before sampling can begin, i.e., the mixing time of the chain. Many methods exist that aim to answer this question by

[1]  P. Diaconis,et al.  Random walks on trees and matchings , 2002 .

[2]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[3]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[4]  Dana Randall,et al.  Analyzing Glauber Dynamics by Comparison of Markov Chains , 1998, LATIN.

[5]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[6]  Jeremy M. Brown,et al.  When trees grow too long: investigating the causes of highly inaccurate bayesian branch-length estimates. , 2010, Systematic biology.

[7]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[8]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[9]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[10]  P. Diaconis,et al.  Geometric Bounds for Eigenvalues of Markov Chains , 1991 .

[11]  J. Ross Invasion of infectious diseases in finite homogeneous populations. , 2011, Journal of theoretical biology.

[12]  Timothy J. Harlow,et al.  Searching for convergence in phylogenetic Markov chain Monte Carlo. , 2006, Systematic biology.

[13]  J. Propp,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996 .

[14]  S. F. Jarner,et al.  Geometric ergodicity of Metropolis algorithms , 2000 .

[15]  P. Kuhnert,et al.  Phylogenetic analysis of Pasteurella multocida subspecies and molecular identification of feline P. multocida subsp. septica by 16S rRNA gene sequencing. , 2000, International journal of medical microbiology : IJMM.

[16]  Karen G. Dowell Molecular Phylogenetics An introduction to computational methods and tools for analyzing evolutionary relationships , 2008 .

[17]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[18]  J. Rosenthal Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo , 1995 .

[19]  G. Fort,et al.  On the geometric ergodicity of hybrid samplers , 2003, Journal of Applied Probability.

[20]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[21]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[22]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[23]  Jason Schweinsberg,et al.  An O(n2) bound for the relaxation time of a Markov chain on cladograms , 2002, Random Struct. Algorithms.

[24]  Bin Yu Estimating L Error of Kernel Estimator: Monitoring Convergence of Markov Samplers , .

[25]  Timothy J. Harlow,et al.  Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation , 2005, BMC Evolutionary Biology.

[26]  Stephen P. Brooks,et al.  Assessing Convergence of Markov Chain Monte Carlo Algorithms , 2007 .

[27]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[28]  A. Edwards,et al.  The reconstruction of evolution , 1963 .

[29]  P. Forster,et al.  Phylogenetic Methods and the Prehistory of Languages , 2006 .

[30]  D. Pearl,et al.  Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. , 2001, Systematic biology.

[31]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[32]  B. Carlin,et al.  Diagnostics: A Comparative Review , 2022 .

[33]  J. Huelsenbeck Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? , 1998, Systematic biology.

[34]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[35]  J. Felsenstein,et al.  PHYLIP: phylogenetic inference package version 3.5c. Distributed over the Internet , 1993 .

[36]  Ziheng Yang,et al.  Branch-length prior influences Bayesian posterior probability of phylogeny. , 2005, Systematic biology.

[37]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[38]  Alexei J. Drummond,et al.  Phylogenetic and epidemic modeling of rapidly evolving infectious diseases , 2011, Infection, Genetics and Evolution.

[39]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[40]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[41]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .

[42]  Radu Herbei,et al.  Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics , 2013, Statistical applications in genetics and molecular biology.

[43]  Mary Kathryn Cowles,et al.  A simulation approach to convergence rates for Markov chain Monte Carlo algorithms , 1998, Stat. Comput..

[44]  John W. Taylor,et al.  Phylogenetic Analysis of Rhinosporidium seeberi’s 18S Small-Subunit Ribosomal DNA Groups This Pathogen among Members of the Protoctistan Mesomycetozoa Clade , 1999, Journal of Clinical Microbiology.

[45]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[46]  Jason Schweinsberg An O(n 2 ) bound for the relaxation time of a Markov chain on cladograms , 2002 .

[47]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[48]  Update: transmission of HIV infection during invasive dental procedures--Florida. , 1991, MMWR. Morbidity and mortality weekly report.

[49]  M. Steel,et al.  Distributions of cherries for two models of trees. , 2000, Mathematical biosciences.

[50]  Thomas G. Mitchell,et al.  Phylogeny and Evolution of Medical Species of Candida and Related Taxa: a Multigenic Analysis , 2004, Journal of Clinical Microbiology.

[51]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[52]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[53]  Mark Jerrum,et al.  Approximating the Permanent , 1989, SIAM J. Comput..

[54]  Galin L. Jones,et al.  Honest Exploration of Intractable Probability Distributions via Markov Chain Monte Carlo , 2001 .

[55]  K. Crandall,et al.  TCS: a computer program to estimate gene genealogies , 2000, Molecular ecology.

[56]  L. A. Breyer SOME MULTI-STEP COUPLING CONSTRUCTIONS FOR MARKOV CHAINS , 2000 .

[57]  D. Penny,et al.  Branch and bound algorithms to determine minimal evolutionary trees , 1982 .

[58]  H. Christensen,et al.  Phylogenetic relationships of unclassified, satellitic Pasteurellaceae obtained from different species of birds as demonstrated by 16S rRNA gene sequence comparison. , 2009, Research in microbiology.

[59]  J. Rosenthal,et al.  The polar slice sampler , 2002 .

[60]  David Aldous,et al.  Mixing Time for a Markov Chain on Cladograms , 2000, Combinatorics, Probability and Computing.

[61]  A. F. Bissell,et al.  Cusum Techniques for Quality Control , 1969 .

[62]  C. Basler,et al.  Phylogenetics and Pathogenesis of Early Avian Influenza Viruses (H5N1), Nigeria , 2008, Emerging infectious diseases.

[63]  Radford M. Neal,et al.  Improving Markov chain Monte Carlo Estimators by Coupling to an Approximating Chain , 2001 .

[64]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[65]  P. Diaconis,et al.  Trailing the Dovetail Shuffle to its Lair , 1992 .

[66]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[67]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[68]  Bin Yu,et al.  Looking at Markov samplers through cusum path plots: a simple diagnostic idea , 1998, Stat. Comput..

[69]  Jie Chen,et al.  Panorama Phylogenetic Diversity and Distribution of Type A Influenza Virus , 2009, PloS one.

[70]  John Geweke,et al.  Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments , 1991 .