Approximate Bayesian Computation: A Nonparametric Perspective

Approximate Bayesian Computation is a family of likelihood-free inference techniques that are well suited to models defined in terms of a stochastic generating mechanism. In a nutshell, Approximate Bayesian Computation proceeds by computing summary statistics sobs from the data and simulating summary statistics for different values of the parameter Θ. The posterior distribution is then approximated by an estimator of the conditional density g(Θ|sobs). In this paper, we derive the asymptotic bias and variance of the standard estimators of the posterior distribution which are based on rejection sampling and linear adjustment. Additionally, we introduce an original estimator of the posterior distribution based on quadratic adjustment and we show that its bias contains a fewer number of terms than the estimator with linear adjustment. Although we find that the estimators with adjustment are not universally superior to the estimator based on rejection sampling, we find that they can achieve better performance when there is a nearly homoscedastic relationship between the summary statistics and the parameter of interest. To make this relationship as homoscedastic as possible, we propose to use transformations of the summary statistics. In different examples borrowed from the population genetics and epidemiological literature, we show the potential of the methods with adjustment and of the transformations of the summary statistics. Supplemental materials containing the details of the proofs are available online.

[1]  G. Box,et al.  Transformation of the Independent Variables , 1962 .

[2]  E. Nadaraya On Estimating Regression , 1964 .

[3]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[4]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[5]  L. L. Cam,et al.  Sufficiency and Approximate Sufficiency , 1964 .

[6]  T. Amemiya Non-linear regression models , 1983 .

[7]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[8]  P. Diggle,et al.  Monte Carlo Methods of Inference for Implicit Statistical Models , 1984 .

[9]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[10]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[11]  Albert Y. Lo,et al.  Consistent and Robust Bayes Procedures for Location Based on Partial Information , 1990 .

[12]  Jianqing Fan Design-adaptive Nonparametric Regression , 1992 .

[13]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[14]  Brian D. Ripley,et al.  Non-linear Regression Models , 1994 .

[15]  M. Wand,et al.  Multivariate Locally Weighted Least Squares Regression , 1994 .

[16]  G. Schoolnik,et al.  The epidemiology of tuberculosis in San Francisco. A population-based study using conventional and molecular methods. , 1994, The New England journal of medicine.

[17]  Jianqing Fan,et al.  Local polynomial kernel regression for generalized linear models and quasi-likelihood functions , 1995 .

[18]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[19]  Rob J Hyndman,et al.  Estimating and Visualizing Conditional Densities , 1996 .

[20]  J. Simonoff Multivariate Density Estimation , 1996 .

[21]  Jianqing Fan,et al.  Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems , 1996 .

[22]  Estimating the Age of the Common Ancestor of Men from the ZFY Intron , 1996, Science.

[23]  W. Li,et al.  Estimating the age of the common ancestor of a sample of DNA sequences. , 1997, Molecular biology and evolution.

[24]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[25]  Jianqing Fan,et al.  Efficient Estimation of Conditional Variance Functions in Stochastic Regression , 1998 .

[26]  A. von Haeseler,et al.  Inference of population history using a likelihood approach. , 1998, Genetics.

[27]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[28]  A. Ullah,et al.  Nonparametric Econometrics , 1999 .

[29]  Rodney C. Wolff,et al.  Methods for estimating a conditional distribution function , 1999 .

[30]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[31]  J. De Gooijer,et al.  On Conditional Density Estimation , 2003 .

[32]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[33]  A data-driven method for estimating conditional densities , 2003 .

[34]  Jeffrey S. Racine,et al.  Cross-Validation and the Estimation of Conditional Probability Densities , 2004 .

[35]  W. Härdle Nonparametric and Semiparametric Models , 2004 .

[36]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[37]  Jean-Marie Cornuet,et al.  Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers , 2005, Genetics.

[38]  Andrew R. Francis,et al.  Using Approximate Bayesian Computation to Estimate Tuberculosis Transmission Parameters From Genotype Data , 2006, Genetics.

[39]  Carsten Wiuf,et al.  Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum , 2007, PLoS Comput. Biol..

[40]  S. Coles,et al.  Inference for Stereological Extremes , 2007 .

[41]  L. Excoffier,et al.  Statistical evaluation of alternative models of human evolution , 2007, Proceedings of the National Academy of Sciences.

[42]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[43]  C. Robert,et al.  ABC likelihood-free methods for model choice in Gibbs random fields , 2008, 0807.2767.

[44]  Noah A. Rosenberg,et al.  Demographic History of European Populations of Arabidopsis thaliana , 2008, PLoS genetics.

[45]  Mark A Beaumont,et al.  An Approximate Bayesian Computation Approach to Overcome Biases That Arise When Using Amplified Fragment Length Polymorphism Markers to Study Population Structure , 2008, Genetics.

[46]  Mark A. Beaumont,et al.  Joint determination of topology, divergence time, and immigration in population trees , 2008 .

[47]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[48]  J. Marin,et al.  Adaptivity for ABC algorithms: the ABC-PMC scheme , 2008 .

[49]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[50]  Richard D Wilkinson,et al.  Estimating primate divergence times by using conditioned birth-and-death processes. , 2009, Theoretical population biology.

[51]  M. Cox Accuracy of Molecular Dating with the Rho Statistic: Deviations from Coalescent Expectations Under a Range of Demographic Models , 2008, Human biology.

[52]  C. Robert,et al.  Adaptive approximate Bayesian computation , 2008, 0805.2256.

[53]  Franck Jabot,et al.  Inferring the parameters of the neutral theory of biodiversity using phylogenetic information and implications for tropical forests. , 2009, Ecology letters.

[54]  Christoph Leuenberger Daniel Wegmann Laurent Excoffier Bayesian Computation and Model Selection in Population Genetics , 2009, 0901.2231.

[55]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[56]  Viet Chi Tran,et al.  HIV with contact tracing: a case study in approximate Bayesian computation. , 2008, Biostatistics.

[57]  S. Sisson,et al.  Likelihood-free Markov chain Monte Carlo , 2010, 1001.2058.

[58]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..