Bayesian Phylogenetic Inference Using a Combinatorial Sequential Monte Carlo Method

The application of Bayesian methods to large-scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series models. Such methods have been recently developed to address phylogenetic inference problems but currently available techniques are only applicable to a restricted class of phylogenetic tree models compared to MCMC. In this article, we propose an original combinatorial SMC (CSMC) method to approximate posterior phylogenetic tree distributions, which is applicable to a general class of models and can be easily combined with MCMC to infer evolutionary parameters. Our method only relies on the existence of a flexible partially ordered set structure and is more generally applicable to sampling problems on combinatorial spaces. We demonstrate that the proposed CSMC algorithm provides consistent estimates under weak assumptions, is computationally fast, and is additionally easily parallelizable. Supplementary materials for this article are available online.

[1]  Allan C. Wilson,et al.  Mitochondrial DNA sequences of primates: Tempo and mode of evolution , 2005, Journal of Molecular Evolution.

[2]  R. Stanley What Is Enumerative Combinatorics , 1986 .

[3]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[4]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs: Correction , 2002, BMC Bioinformatics.

[5]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[6]  Maurizio Dapor Monte Carlo Strategies , 2020, Transport of Energetic Electrons in Solids.

[7]  A. G. Pedersen,et al.  Computational Molecular Evolution , 2013 .

[8]  Marc A Suchard,et al.  Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data. , 2010, The annals of applied statistics.

[9]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[10]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[11]  Ian Holmes,et al.  Evolutionary HMMs: a Bayesian approach to multiple alignment , 2001, Bioinform..

[12]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[13]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[14]  M Chévremont,et al.  Mitochondrial DNA , 2009, Encyclopedia of Biometrics.

[15]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[16]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[17]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[18]  David H. Mathews,et al.  Quantifying the Impact of Dependent Evolution among Sites in Phylogenetic Inference , 2010, Systematic biology.

[19]  Nick Whiteley,et al.  Forest resampling for distributed sequential Monte Carlo , 2014, Stat. Anal. Data Min..

[20]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[21]  Kai Yang,et al.  Markov Chain Monte Carlo Algorithms , 2014, Encyclopedia of Social Network Analysis and Mining.

[22]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[23]  Alexei J Drummond,et al.  Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. , 2006, Molecular biology and evolution.

[24]  Yee Whye Teh,et al.  An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering , 2008, NIPS.

[25]  J. Halton Sequential Monte Carlo , 1962, Mathematical Proceedings of the Cambridge Philosophical Society.

[26]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[27]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[28]  D. Morrison Multiple sequence alignment for phylogenetic purposes , 2006 .

[29]  Joshua S. Paul,et al.  An Accurate Sequentially Markov Conditional Sampling Distribution for the Coalescent With Recombination , 2011, Genetics.

[30]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Donna B. Stoddard,et al.  Getting IT right. , 2004, Harvard business review.

[32]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[33]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[34]  Serita M. Nelesen,et al.  SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. , 2012, Systematic biology.

[35]  Geoffrey B. West,et al.  One rate to rule them all , 2004 .

[36]  Tandy J. Warnow,et al.  PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences , 2015, J. Comput. Biol..

[37]  A. Doucet,et al.  Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator , 2012, 1210.1871.

[38]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[39]  Daniel L. Ayres,et al.  BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics , 2011, Systematic biology.

[40]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[41]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[42]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[43]  M. Suchard,et al.  Bayesian random local clocks, or one rate to rule them all , 2010, BMC Biology.

[44]  S. Höhna Bayesian Phylogenetic Inference , 2011 .

[45]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[46]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[47]  Alexei J Drummond,et al.  Guided tree topology proposals for Bayesian phylogenetic inference. , 2012, Systematic biology.

[48]  J. Geweke,et al.  Getting It Right , 2004 .

[49]  Dilan Görür,et al.  Scalable Inference on Kingman's Coalescent using Pair Similarity , 2012, AISTATS.

[50]  D. Penny Inferring Phylogenies.—Joseph Felsenstein. 2003. Sinauer Associates, Sunderland, Massachusetts. , 2004 .

[51]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[52]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[53]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[54]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[55]  Faming Liang,et al.  Phylogenetic tree construction using sequential stochastic approximation Monte Carlo , 2008, Biosyst..

[56]  Luke Tierney Markov Chain Monte Carlo Algorithms , 2006 .

[57]  Robert C. Griffiths,et al.  Monte Carlo inference methods in population genetics , 1996 .

[58]  Asger Hobolth,et al.  SIMULATION FROM ENDPOINT-CONDITIONED, CONTINUOUS-TIME MARKOV CHAINS ON A FINITE STATE SPACE, WITH APPLICATIONS TO MOLECULAR EVOLUTION. , 2009, The annals of applied statistics.

[59]  J. R. Stauffer,et al.  Evolution of NADH dehydrogenase subunit 2 in east African cichlid fish. , 1995, Molecular phylogenetics and evolution.

[60]  Michael I. Jordan,et al.  Phylogenetic Inference via Sequential Monte Carlo , 2012, Systematic biology.

[61]  Serita M. Nelesen,et al.  Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees , 2009, Science.

[62]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[63]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[64]  Maurits J. J. Dijkstra,et al.  Multiple Sequence Alignment. , 2017, Methods in molecular biology.

[65]  Michael Defoin-Platel,et al.  Clock-constrained tree proposal operators in Bayesian phylogenetic inference , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[66]  Benjamin D. Redelings,et al.  BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny , 2006, Bioinform..

[67]  R. Stanley Enumerative Combinatorics: Volume 1 , 2011 .

[68]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..