An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics.

We describe an "embarrassingly parallel" method for Bayesian phylogenetic inference, annealed Sequential Monte Carlo (SMC), based on recent advances in the SMC literature such as adaptive determination of annealing parameters. The algorithm provides an approximate posterior distribution over trees and evolutionary parameters as well as an unbiased estimator for the marginal likelihood. This unbiasedness property can be used for the purpose of testing the correctness of posterior simulation software. We evaluate the performance of phylogenetic annealed SMC by reviewing and comparing with other computational Bayesian phylogenetic methods, in particular, different marginal likelihood estimation methods. Unlike previous SMC methods in phylogenetics, our annealed method can utilize standard Markov chain Monte Carlo (MCMC) tree moves and hence benefit from the large inventory of such moves available in the literature. Consequently, the annealed SMC method should be relatively easy to incorporate into existing phylogenetic software packages based on MCMC algorithms. We illustrate our method using simulation studies and real data analysis. [Marginal likelihood; phylogenetics; Sequential Monte Carlo.].

[1]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[2]  Alexei J Drummond,et al.  Guided tree topology proposals for Bayesian phylogenetic inference. , 2012, Systematic biology.

[3]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[4]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[5]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[6]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[7]  Louis J. Billera,et al.  Geometry of the Space of Phylogenetic Trees , 2001, Adv. Appl. Math..

[8]  Michael J. Landis,et al.  RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language , 2016, Systematic biology.

[9]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[10]  T. Lai,et al.  A general theory of particle filters in hidden Markov models and some applications , 2013, 1312.5114.

[11]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[12]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[13]  Arnaud Doucet,et al.  Bayesian Phylogenetic Inference Using a Combinatorial Sequential Monte Carlo Method , 2015 .

[14]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[15]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[16]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[17]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[18]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[19]  Michael Defoin-Platel,et al.  Clock-constrained tree proposal operators in Bayesian phylogenetic inference , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[20]  J. Huelsenbeck,et al.  Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. , 2004, Molecular biology and evolution.

[21]  N. Chopin Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference , 2004, math/0508594.

[22]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[23]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[24]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[25]  Dilan Görür,et al.  Scalable Inference on Kingman's Coalescent using Pair Similarity , 2012, AISTATS.

[26]  B. Rozovskii,et al.  The Oxford Handbook of Nonlinear Filtering , 2011 .

[27]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[28]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[29]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[30]  Gareth O. Roberts,et al.  Towards optimal scaling of metropolis-coupled Markov chain Monte Carlo , 2011, Stat. Comput..

[31]  Yan Zhou,et al.  Toward Automatic Model Comparison: An Adaptive Sequential Monte Carlo Approach , 2016 .

[32]  Clifford J. Maloney,et al.  Systematic mistake analysis of digital computer programs , 1963, CACM.

[33]  Nicolas Lartillot,et al.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating , 2009, Bioinform..

[34]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[35]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[36]  R. Kohn,et al.  Speeding Up MCMC by Efficient Data Subsampling , 2014, Journal of the American Statistical Association.

[37]  M. Rattray,et al.  Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. , 2002, Molecular biology and evolution.

[38]  Nando de Freitas,et al.  Bayesian Analysis of Continuous Time Markov Chains with Application to Phylogenetic Modelling , 2016 .

[39]  W. D. Wallis,et al.  Combinatorial Mathematics VI , 1979 .

[40]  Nicolas Lartillot,et al.  Conjugate Gibbs Sampling for Bayesian Phylogenetic Models , 2006, J. Comput. Biol..

[41]  M. Quiroz Speeding Up MCMC by Delayed Acceptance and Data Subsampling , 2015 .

[42]  Joseph Felsenstein,et al.  Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters , 1973 .

[43]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[44]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[45]  M. Holder,et al.  Phylogeny estimation: traditional and Bayesian approaches , 2003, Nature Reviews Genetics.

[46]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[47]  Lynn Kuo,et al.  Bayesian Phylogenetics : Methods, Algorithms, and Applications , 2014 .

[48]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[49]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[50]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[51]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[52]  Vu C. Dinh,et al.  Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals , 2017, bioRxiv.

[53]  Michael I. Jordan,et al.  Phylogenetic Inference via Sequential Monte Carlo , 2012, Systematic biology.

[54]  Vu C. Dinh,et al.  Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo , 2016, Systematic biology.

[55]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[56]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[57]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[58]  J. Geweke,et al.  Getting It Right , 2004 .

[59]  D. Robinson,et al.  Comparison of weighted labelled trees , 1979 .

[60]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[61]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[62]  J. Scott Provan,et al.  A Fast Algorithm for Computing Geodesic Distances in Tree Space , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[63]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[64]  Aaron M. King,et al.  Infectious Disease Dynamics Inferred from Genetic Data via Sequential Monte Carlo , 2016, bioRxiv.

[65]  M. Suchard,et al.  Phylogeography takes a relaxed random walk in continuous space and time. , 2010, Molecular biology and evolution.