Integrated Nested Laplace Approximation for Bayesian Nonparametric Phylodynamics

The goal of phylodynamics, an area on the intersection of phylogenetics and population genetics, is to reconstruct population size dynamics from genetic data. Recently, a series of nonparametric Bayesian methods have been proposed for such demographic reconstructions. These methods rely on prior specifications based on Gaussian processes and proceed by approximating the posterior distribution of population size trajectories via Markov chain Monte Carlo (MCMC) methods. In this paper, we adapt an integrated nested Laplace approximation (INLA), a recently proposed approximate Bayesian inference for latent Gaussian models, to the estimation of population size trajectories. We show that when a genealogy of sampled individuals can be reliably estimated from genetic data, INLA enjoys high accuracy and can replace MCMC entirely. We demonstrate significant computational efficiency over the state-of-the-art MCMC methods. We illustrate INLA-based population size inference using simulations and genealogies of hepatitis C and human influenza viruses.

[1]  E. Lyons,et al.  Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings , 2009, Science.

[2]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[3]  H. Rue,et al.  Norges Teknisk-naturvitenskapelige Universitet Approximating Hidden Gaussian Markov Random Fields Approximating Hidden Gaussian Markov Random Fields , 2003 .

[4]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[5]  K. Strimmer,et al.  Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo , 2005, BMC Evolutionary Biology.

[6]  S. Tavaré,et al.  Sampling theory for neutral alleles in a varying environment. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[7]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[8]  O. Pybus,et al.  An integrated framework for the inference of viral population history from reconstructed genealogies. , 2000, Genetics.

[9]  C. J-F,et al.  THE COALESCENT , 1980 .

[10]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[11]  Haavard Rue,et al.  A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA) , 2012, 1301.1817.

[12]  Erik Axelsson,et al.  Ancient DNA analyses exclude humans as the driving force behind late Pleistocene musk ox (Ovibos moschatus) population dynamics , 2010, Proceedings of the National Academy of Sciences.

[13]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .

[14]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  Y. Fu,et al.  Estimating effective population size or mutation rate using the frequencies of mutations of various classes in a sample of DNA sequences. , 1994, Genetics.

[16]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[17]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[18]  S. Ho,et al.  Skyline‐plot methods for estimating demographic history from nucleotide sequences , 2011, Molecular ecology resources.

[19]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[20]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[21]  J. Felsenstein,et al.  Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates. , 1992, Genetical research.

[22]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[23]  A. Drummond,et al.  Bayesian inference of population size history from multiple loci , 2008, BMC Evolutionary Biology.

[24]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[25]  Ryan P. Adams,et al.  Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities , 2009, ICML '09.

[26]  Tom Heskes,et al.  Improving posterior marginal approximations in latent Gaussian models , 2010, AISTATS.

[27]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[28]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[29]  J. Møller,et al.  Log Gaussian Cox Processes , 1998 .

[30]  Vladimir N Minin,et al.  Gaussian Process‐Based Bayesian Nonparametric Inference of Population Size Trajectories from Gene Genealogies , 2011, Biometrics.

[31]  O. Pybus,et al.  The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach. , 2003, Molecular biology and evolution.

[32]  M. Nordborg,et al.  Coalescent Theory , 2019, Handbook of Statistical Genomics.