Phylogenetic Inference via Sequential Monte Carlo

Abstract Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to Bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of Bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework—which we refer to as PosetSMC—to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC–SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.

[1]  J. Doob Markoff chains—denumerable case , 1945 .

[2]  J. Felsenstein Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[3]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[4]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[5]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[6]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[7]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[8]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[9]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[10]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[11]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[12]  Robert C. Griffiths,et al.  Monte Carlo inference methods in population genetics , 1996 .

[13]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[14]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[15]  P. Fearnhead,et al.  Improved particle filter for nonlinear problems , 1999 .

[16]  P. Fearnhead,et al.  An improved particle filter for non-linear problems , 1999 .

[17]  J. Huelsenbeck,et al.  A compound poisson process for relaxing the molecular clock. , 2000, Genetics.

[18]  Jun S. Liu,et al.  The Multiple-Try Method and Local Optimization in Metropolis Sampling , 2000 .

[19]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[20]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[21]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[22]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[23]  A. Doucet,et al.  A survey of convergence results on particle ltering for practitioners , 2002 .

[24]  Arnaud Doucet,et al.  A survey of convergence results on particle filtering methods for practitioners , 2002, IEEE Trans. Signal Process..

[25]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[26]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[27]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[28]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Elena Rivas,et al.  Evolutionary models for insertions and deletions in a probabilistic modeling framework , 2005, BMC Bioinformatics.

[30]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[31]  Pierre Del Moral,et al.  Feynman-Kac formulae , 2004 .

[32]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[33]  Hugh Griffiths,et al.  IEE Proceedings - Radar, Sonar and Navigation , 2004 .

[34]  D. Haussler,et al.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. , 2003, Molecular biology and evolution.

[35]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[36]  Thomas M. Keane,et al.  DPRml: distributed phylogeny reconstruction by maximum likelihood , 2005, Bioinform..

[37]  R. Douc,et al.  Limit theorems for weighted samples with applications to sequential Monte Carlo methods , 2005, math/0507042.

[38]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[39]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[40]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[41]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[42]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[43]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[44]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[45]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[46]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[47]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[48]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[49]  Yee Whye Teh,et al.  An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering , 2008, NIPS.

[50]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[51]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[52]  Marc A Suchard,et al.  Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data. , 2010, The annals of applied statistics.

[53]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[54]  Joshua S. Paul,et al.  An Accurate Sequentially Markov Conditional Sampling Distribution for the Coalescent With Recombination , 2011, Genetics.

[55]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[56]  C. Crépeau,et al.  A quantum bit commitment scheme provably unbreakable by both parties , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[57]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[58]  J. Durbin,et al.  Linear state space models , 2012 .

[59]  P. Moral,et al.  On adaptive resampling strategies for sequential Monte Carlo methods , 2012, 1203.0464.

[60]  Michael D. Hendy,et al.  Mathematical Elegance with Biochemical Realism: The Covarion Model of Molecular Evolution , 2001, Journal of Molecular Evolution.