AABC: approximate approximate Bayesian computation for inference in population-genetic models.

Approximate Bayesian computation (ABC) methods perform inference on model-specific parameters of mechanistically motivated parametric models when evaluating likelihoods is difficult. Central to the success of ABC methods, which have been used frequently in biology, is computationally inexpensive simulation of data sets from the parametric model of interest. However, when simulating data sets from a model is so computationally expensive that the posterior distribution of parameters cannot be adequately sampled by ABC, inference is not straightforward. We present "approximate approximate Bayesian computation" (AABC), a class of computationally fast inference methods that extends ABC to models in which simulating data is expensive. In AABC, we first simulate a number of data sets small enough to be computationally feasible to simulate from the parametric model. Conditional on these data sets, we use a statistical model that approximates the correct parametric model and enables efficient simulation of a large number of data sets. We show that under mild assumptions, the posterior distribution obtained by AABC converges to the posterior distribution obtained by ABC, as the number of data sets simulated from the parametric model and the sample size of the observed data set increase. We demonstrate the performance of AABC on a population-genetic model of natural selection, as well as on a model of the admixture history of hybrid populations. This latter example illustrates how, in population genetics, AABC is of particular utility in scenarios that rely on conceptually straightforward but potentially slow forward-in-time simulations.

[1]  Simon Tavaré,et al.  Approximate Bayesian Computation and MCMC , 2004 .

[2]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[3]  G. Simpson,et al.  Genetics, paleontology, and evolution. , 1949 .

[4]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[5]  Mike West,et al.  Bayesian Learning from Marginal Data in Bionetwork Models , 2011, Statistical applications in genetics and molecular biology.

[6]  Kimberly D. Siegmund,et al.  Modeling DNA Methylation in a Population of Cancer Cells , 2008, Statistical applications in genetics and molecular biology.

[7]  Jean-Marie Cornuet,et al.  GENETIC ANALYSIS OF COMPLEX DEMOGRAPHIC SCENARIOS: SPATIALLY EXPANDING POPULATIONS OF THE CANE TOAD, BUFO MARINUS , 2004, Evolution; international journal of organic evolution.

[8]  Mattias Jakobsson,et al.  Deep divergences of human gene trees and models of human origins. , 2011, Molecular biology and evolution.

[9]  J. Møller Discussion on the paper by Feranhead and Prangle , 2012 .

[10]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[11]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[12]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[13]  L. Excoffier,et al.  Statistical evaluation of alternative models of human evolution , 2007, Proceedings of the National Academy of Sciences.

[14]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[15]  S. A. Sisson,et al.  A note on target distribution ambiguity of likelihood-free samplers , 2010 .

[16]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[17]  Noah A. Rosenberg,et al.  Demographic History of European Populations of Arabidopsis thaliana , 2008, PLoS genetics.

[18]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[19]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Paul Fearnhead,et al.  Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC , 2010, 1004.1112.

[21]  G. Casella,et al.  Report of the Editors—2011 , 2012 .

[22]  Jean-Marie Hombert,et al.  Origins and Genetic Diversity of Pygmy Hunter-Gatherers from Western Central Africa , 2009, Current Biology.

[23]  Noah A. Rosenberg,et al.  A General Mechanistic Model for Admixture Histories of Hybrid Populations , 2011, Genetics.

[24]  G. Peters,et al.  A Note on Target Distribution Ambiguity for Likelihood-Free Samplers (ABC) , 2010 .

[25]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[26]  M. Przeworski,et al.  A new approach to estimate parameters of speciation models with application to apes. , 2007, Genome research.

[27]  E. Mayr Adaptation and selection , 1981 .

[28]  Jean-Michel Marin,et al.  ABC methods for model choice in Gibbs random fields , 2008 .

[29]  Christophe Andrieu,et al.  Model criticism based on likelihood-free inference, with an application to protein network evolution , 2009, Proceedings of the National Academy of Sciences.

[30]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[31]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[32]  C. Buerkle,et al.  Admixture as the basis for genetic mapping. , 2008, Trends in ecology & evolution.

[33]  S. Tavaré,et al.  Dating primate divergences through an integrated analysis of palaeontological and molecular data. , 2011, Systematic biology.

[34]  D. Balding,et al.  Statistical Applications in Genetics and Molecular Biology On Optimal Selection of Summary Statistics for Approximate Bayesian Computation , 2011 .

[35]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[36]  N. L. Johnson,et al.  Continuous Multivariate Distributions, Volume 1: Models and Applications , 2019 .

[37]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[38]  Alan Genz,et al.  Efficient Simulation and Likelihood Methods for Non-Neutral Multi-Allele Models , 2012, J. Comput. Biol..

[39]  M. Beaumont Estimation of population growth or decline in genetically monitored populations. , 2003, Genetics.

[40]  Christian P Robert,et al.  Lack of confidence in approximate Bayesian computation model choice , 2011, Proceedings of the National Academy of Sciences.

[41]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[42]  N. Risch,et al.  Estimation of individual admixture: Analytical and study design considerations , 2005, Genetic epidemiology.

[43]  Yanan Fan,et al.  Correction for Sequential Monte Carlo without likelihoods, t , 2009 .