Variational Combinatorial Sequential Monte Carlo Methods for Bayesian Phylogenetic Inference

Bayesian phylogenetic inference is often conducted via local or sequential search over topologies and branch lengths using algorithms such as random-walk Markov chain Monte Carlo (MCMC) or Combinatorial Sequential Monte Carlo (CSMC). However, when MCMC is used for evolutionary parameter learning, convergence requires long runs with inefficient exploration of the state space. We introduce Variational Combinatorial Sequential Monte Carlo (VCSMC), a powerful framework that establishes variational sequential search to learn distributions over intricate combinatorial structures. We then develop nested CSMC, an efficient proposal distribution for CSMC and prove that nested CSMC is an exact approximation to the (intractable) locally optimal proposal. We use nested CSMC to define a second objective, VNCSMC which yields tighter lower bounds than VCSMC. We show that VCSMC and VNCSMC are computationally efficient and explore higher probability spaces than existing methods on a range of tasks.

[1]  A. Weir,et al.  Laboulbeniopsis termitarius, an ectoparasite of termites newly recognized as a member of the Laboulbeniomycetes , 2003, Mycologia.

[2]  Cheng Zhang,et al.  Improved Variational Bayesian Phylogenetic Inference with Normalizing Flows , 2020, NeurIPS.

[3]  Vu C. Dinh,et al.  Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo , 2016, Systematic biology.

[4]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[5]  Antonio Khalil Moretti Variational Bayesian Methods for Inferring Spatial Statistics and Nonlinear Dynamics , 2021 .

[6]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[7]  S. Nadler,et al.  Molecular evidence for Acanthocephala as a subtaxon of Rotifera , 1996, Journal of Molecular Evolution.

[8]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[9]  C. A. Naesseth,et al.  Markovian Score Climbing: Variational Inference with KL(p||q) , 2020, NeurIPS.

[10]  Y. Guan,et al.  Full-Genome Deep Sequencing and Phylogenetic Analysis of Novel Human Betacoronavirus , 2013, Emerging infectious diseases.

[11]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[12]  Cheng Zhang,et al.  Generalizing Tree Probability Estimation via Bayesian Networks , 2018, NeurIPS.

[13]  Shijia Wang,et al.  Particle Gibbs sampling for Bayesian phylogenetic inference , 2021, Bioinform..

[14]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[15]  Peter K. Sorger,et al.  Recombination and lineage-specific mutations linked to the emergence of SARS-CoV-2 , 2020, Genome Medicine.

[16]  Iddo Drori,et al.  Variational Objectives for Markovian Dynamics with Backward Simulation , 2020, ECAI.

[17]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[18]  R. Nielsen,et al.  Assessing Uncertainty in the Rooting of the SARS-CoV-2 Phylogeny , 2020, bioRxiv.

[19]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[20]  Fredrik Lindsten,et al.  High-Dimensional Filtering Using Nested Sequential Monte Carlo , 2016, IEEE Transactions on Signal Processing.

[21]  Arnaud Doucet,et al.  Bayesian Phylogenetic Inference Using a Combinatorial Sequential Monte Carlo Method , 2015 .

[22]  Itsik Pe'er,et al.  Smoothing Nonlinear Variational Objectives with Sequential Monte Carlo , 2019, DGS@ICLR.

[23]  Zoubin Ghahramani,et al.  Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling , 2010, The Annals of Applied Statistics.

[24]  S. Moretti,et al.  Better jet clustering algorithms , 1997, hep-ph/9707323.

[25]  Cheng Zhang,et al.  Variational Bayesian Phylogenetic Inference , 2018, ICLR.

[26]  Tuan Anh Le,et al.  Auto-Encoding Sequential Monte Carlo , 2017, ICLR.

[27]  Susanna K. P. Lau,et al.  Coronavirus Genomics and Bioinformatics Analysis , 2010, Viruses.

[28]  Fredrik Lindsten,et al.  Nested Sequential Monte Carlo Methods , 2015, ICML.

[29]  S. Höche Introduction to parton-shower event generators , 2014 .

[30]  Fredrik Lindsten,et al.  Elements of Sequential Monte Carlo , 2019, Found. Trends Mach. Learn..

[31]  Minh Anh Nguyen,et al.  Ultrafast Approximation for Phylogenetic Bootstrap , 2013, Molecular biology and evolution.

[32]  Duncan K. Ralph,et al.  A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis , 2019, PLoS Comput. Biol..

[33]  Ning Zhang,et al.  Molecular phylogeny of dogwood anthracnose fungus (Discula destructiva) and the Diaporthales , 2001 .

[34]  Scott W. Linderman,et al.  Variational Sequential Monte Carlo , 2017, AISTATS.

[35]  How to Estimate the Number of Self-Avoiding Walks over 10100? Use Random Walks , 2013, 1304.7352.

[36]  Michael I. Jordan,et al.  Phylogenetic Inference via Sequential Monte Carlo , 2012, Systematic biology.

[37]  Cheng Zhang,et al.  Probabilistic Path Hamiltonian Monte Carlo , 2017, ICML.

[38]  T Gojobori,et al.  Molecular phylogeny and evolution of primate mitochondrial DNA. , 1988, Molecular biology and evolution.

[39]  C. Gleasner,et al.  A metagenomic viral discovery approach identifies potential zoonotic and novel mammalian viruses in Neoromicia bats within South Africa , 2018, PloS one.

[40]  Chris J. Maddison,et al.  Twisted Variational Sequential Monte Carlo , 2018 .

[41]  Yiming Bao,et al.  NCBI Viral Genomes Resource , 2014, Nucleic Acids Res..

[42]  Andrew Rambaut,et al.  Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic , 2020, Nature Microbiology.

[43]  Ziheng Yang,et al.  Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene Loci and calibration points, with application to a radiation of cute-looking mouse lemur species. , 2003, Systematic biology.

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  John Cunningham,et al.  Nonlinear Evolution via Spatially-Dependent Linear Dynamics for Electrophysiology and Calcium Data , 2018 .

[46]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[47]  Amy Y. Rossman,et al.  Molecular studies of the Bionectriaceae using large subunit rDNA sequences , 2001 .

[48]  A. Sokal Monte Carlo methods for the self-avoiding walk , 1994, hep-lat/9509032.

[49]  S. Hedges,et al.  Tetrapod phylogeny inferred from 18S and 28S ribosomal RNA sequences and a review of the evidence for amniote relationships. , 1990, Molecular biology and evolution.

[50]  Iddo Drori,et al.  Particle Smoothing Variational Objectives , 2019, ArXiv.

[51]  Levi Boyles,et al.  The Time-Marginalized Coalescent Prior for Hierarchical Clustering , 2012, NIPS.

[52]  John P. Cunningham,et al.  A Novel Variational Family for Hidden Nonlinear Markov Models , 2018, ArXiv.