Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals

Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phy-logenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this paper we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop ‘guided’ proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely-used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy.

[1]  Vu C. Dinh,et al.  A Surrogate Function for One-Dimensional Phylogenetic Likelihoods , 2017, Molecular biology and evolution.

[2]  Vu C. Dinh,et al.  Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo , 2016, Systematic biology.

[3]  Katherine St. John,et al.  Review Paper: The Shape of Phylogenetic Treespace , 2016, Systematic biology.

[4]  Richard G. Everitt,et al.  Bayesian model comparison with un-normalised likelihoods , 2015, Stat. Comput..

[5]  Richard G. Everitt,et al.  Sequential Bayesian inference for mixture models and the coalescent using sequential Monte Carlo samplers with transformations , 2016 .

[6]  Matthew Loose,et al.  Real-time selective sequencing using nanopore technology , 2016, Nature Methods.

[7]  Arnaud Doucet,et al.  Bayesian Phylogenetic Inference Using a Combinatorial Sequential Monte Carlo Method , 2015 .

[8]  Andrew Rambaut,et al.  Real-time digital pathogen surveillance — the time is now , 2015, Genome Biology.

[9]  Trevor Bedford,et al.  nextflu: real-time tracking of seasonal influenza virus evolution in humans , 2015, Bioinform..

[10]  Matthias Reumann,et al.  A platform for leveraging next generation sequencing for routine microbiology and public health use , 2015, Health Information Science and Systems.

[11]  Yee Whye Teh,et al.  Asynchronous Anytime Sequential Monte Carlo , 2014, NIPS.

[12]  Alexandre Bouchard-Côté,et al.  Memory (and Time) Efficient Sequential Monte Carlo , 2014, ICML.

[13]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[14]  Alexandros Stamatakis,et al.  PUmPER: phylogenies updated perpetually , 2014, Bioinform..

[15]  Tanja Stadler,et al.  Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth–death SIR model , 2013, Journal of The Royal Society Interface.

[16]  Nicolas C. Rochette,et al.  Bio++: efficient extensible libraries and tools for computational molecular evolution. , 2013, Molecular biology and evolution.

[17]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[18]  P. Moral,et al.  On adaptive resampling strategies for sequential Monte Carlo methods , 2012, 1203.0464.

[19]  Michael I. Jordan,et al.  Phylogenetic Inference via Sequential Monte Carlo , 2012, Systematic biology.

[20]  A. Beskos,et al.  On the stability of sequential Monte Carlo methods in high dimensions , 2011, 1103.3965.

[21]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[22]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[23]  E. Virginia Armbrust,et al.  pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree , 2010, BMC Bioinformatics.

[24]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[25]  Michael R. Kosorok,et al.  Detection of gene pathways with predictive power for breast cancer prognosis , 2010, BMC Bioinformatics.

[26]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[27]  Adam M. Johansen,et al.  SMCTC : sequential Monte Carlo in C++ , 2009 .

[28]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[29]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[30]  Benjamin D. Redelings,et al.  BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny , 2006, Bioinform..

[31]  Freda Kemp,et al.  An Introduction to Sequential Monte Carlo Methods , 2003 .

[32]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[33]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[34]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[35]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[36]  S. Jeffery Evolution of Protein Molecules , 1979 .

[37]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[38]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[39]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[40]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .