Improved Variational Bayesian Phylogenetic Inference with Normalizing Flows

Variational Bayesian phylogenetic inference (VBPI) provides a promising general variational framework for efficient estimation of phylogenetic posteriors. However, the current diagonal Lognormal branch length approximation would significantly restrict the quality of the approximating distributions. In this paper, we propose a new type of VBPI, VBPI-NF, as a first step to empower phylogenetic posterior estimation with deep learning techniques. By handling the non-Euclidean branch length space of phylogenetic models with carefully designed permutation equivariant transformations, VBPI-NF uses normalizing flows to provide a rich family of flexible branch length distributions that generalize across different tree topologies. We show that VBPI-NF significantly improves upon the vanilla VBPI on a benchmark of challenging real data Bayesian phylogenetic inference problems. Further investigation also reveals that the structured parameterization in those permutation equivariant transformations can provide additional amortization benefit.

[1]  Frank Noé,et al.  Equivariant Flows: exact likelihood generative learning for symmetric densities , 2020, ICML.

[2]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[3]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[4]  Yang Li,et al.  Exchangeable Generative Models with Flow Scans , 2019, AAAI.

[5]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[6]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[7]  Jin Tian,et al.  COVID-19: Epidemiology, Evolution, and Cross-Disciplinary Perspectives , 2020, Trends in Molecular Medicine.

[8]  A. Weir,et al.  Laboulbeniopsis termitarius, an ectoparasite of termites newly recognized as a member of the Laboulbeniomycetes , 2003, Mycologia.

[9]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[10]  Minh Anh Nguyen,et al.  Ultrafast Approximation for Phylogenetic Bootstrap , 2013, Molecular biology and evolution.

[11]  Cheng Zhang,et al.  Variational Bayesian Phylogenetic Inference , 2018, ICLR.

[12]  S. Hedges,et al.  Tetrapod phylogeny inferred from 18S and 28S ribosomal RNA sequences and a review of the evidence for amniote relationships. , 1990, Molecular biology and evolution.

[13]  Aaron E. Darling,et al.  Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics , 2019, bioRxiv.

[14]  Rob DeSalle,et al.  The expansion of conservation genetics , 2004, Nature Reviews Genetics.

[15]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[16]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[19]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[20]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[21]  Ziheng Yang,et al.  Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene Loci and calibration points, with application to a radiation of cute-looking mouse lemur species. , 2003, Systematic biology.

[22]  Aviral Kumar,et al.  Graph Normalizing Flows , 2019, NeurIPS.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Trevor Bedford,et al.  nextflu: real-time tracking of seasonal influenza virus evolution in humans , 2015, Bioinform..

[25]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[26]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[27]  S. Nadler,et al.  Molecular evidence for Acanthocephala as a subtaxon of Rotifera , 1996, Journal of Molecular Evolution.

[28]  H. Kishino,et al.  Stochastic Variational Inference for Bayesian Phylogenetics: A Case of CAT Model , 2018, bioRxiv.

[29]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[30]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[31]  Chris Whidden,et al.  Quantifying MCMC Exploration of Phylogenetic Tree Space , 2014, Systematic biology.

[32]  Bret Larget,et al.  The estimation of tree posterior probabilities using conditional clade probability distributions. , 2013, Systematic biology.

[33]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[34]  Cheng Zhang,et al.  Generalizing Tree Probability Estimation via Bayesian Networks , 2018, NeurIPS.

[35]  Ziheng Yang,et al.  Divergence dates for Malagasy lemurs estimated from multiple gene loci: geological and evolutionary context , 2004, Molecular ecology.

[36]  Amy Y. Rossman,et al.  Molecular studies of the Bionectriaceae using large subunit rDNA sequences , 2001 .

[37]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[38]  J. Huelsenbeck,et al.  MRBAYES : Bayesian inference of phylogeny , 2001 .

[39]  S. Jeffery Evolution of Protein Molecules , 1979 .

[40]  Danilo Jimenez Rezende,et al.  Equivariant Hamiltonian Flows , 2019, ArXiv.

[41]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[42]  Ning Zhang,et al.  Molecular phylogeny of dogwood anthracnose fungus (Discula destructiva) and the Diaporthales , 2001 .

[43]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[44]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[45]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[46]  Alexei J Drummond,et al.  Guided tree topology proposals for Bayesian phylogenetic inference. , 2012, Systematic biology.

[47]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[48]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[49]  Cheng Zhang,et al.  Probabilistic Path Hamiltonian Monte Carlo , 2017, ICML.