Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation

Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference.

[1]  Paul Marjoram,et al.  Estimating Recombination Rates From Single-Nucleotide Polymorphisms Using Summary Statistics , 2006, Genetics.

[2]  D. Wilkinson Stochastic modelling for quantitative description of heterogeneous biological systems , 2009, Nature Reviews Genetics.

[3]  Olav M. Kvalheim,et al.  Interpretation of partial least squares regression models by means of target projection and selectivity ratio plots , 2010 .

[4]  Jean-Marie Cornuet,et al.  Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation , 2008, Bioinform..

[5]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[6]  G. D. Rayner,et al.  Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions , 2002, Stat. Comput..

[7]  W. Cheney,et al.  Convolution operators for radial basis approximation , 1996 .

[8]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[9]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[10]  S. Coles,et al.  Inference for Stereological Extremes , 2007 .

[11]  Jun S. Liu,et al.  Metropolized independent sampling with comparisons to rejection sampling and importance sampling , 1996, Stat. Comput..

[12]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[13]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[14]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[15]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[16]  Richard G. Everitt,et al.  Likelihood-free estimation of model evidence , 2011 .

[17]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[18]  M. Beaumont,et al.  Likelihood-Free Inference of Population Structure and Local Adaptation in a Bayesian Hierarchical Model , 2010, Genetics.

[19]  Yingcun Xia,et al.  Feature Matching in Time Series Modeling , 2011, 1104.3073.

[20]  K. Heggland,et al.  Estimating functions in indirect inference , 2004 .

[21]  J. Propp,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996 .

[22]  A. Cook,et al.  Inference in Epidemic Models without Likelihoods , 2009 .

[23]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[24]  Michele Haynes Flexible distributions and statistical models in ranking and selection procedures with applications , 1998 .

[25]  Joao S. Lopes,et al.  PopABC: a program to infer historical demographic parameters , 2009, Bioinform..

[26]  Brandon M. Turner,et al.  Hierarchical Approximate Bayesian Computation , 2013, Psychometrika.

[27]  Sumeetpal S. Singh,et al.  Approximate Bayesian Computation for Smoothing , 2012, 1206.5208.

[28]  Gordon K. Smyth,et al.  Series evaluation of Tweedie exponential dispersion model densities , 2005, Stat. Comput..

[29]  Sumeetpal S. Singh,et al.  Filtering via approximate Bayesian computation , 2010, Statistics and Computing.

[30]  Christophe Andrieu,et al.  Model criticism based on likelihood-free inference, with an application to protein network evolution , 2009, Proceedings of the National Academy of Sciences.

[31]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[32]  Teddy Seidenfeld,et al.  Calibration, Coherence, and Scoring Rules , 1985, Philosophy of Science.

[33]  Richard L. Smith,et al.  Approximate Bayesian computing for spatial extremes , 2011, Comput. Stat. Data Anal..

[34]  Michael Creel,et al.  Indirect likelihood inference (revised) , 2013 .

[35]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[36]  C. Robert,et al.  ABC likelihood-free methods for model choice in Gibbs random fields , 2008, 0807.2767.

[37]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[38]  Darren J. Wilkinson,et al.  Parameter inference for stochastic kinetic models of bacterial gene regulation: A Bayesian Approach to Systems Biology , 2011 .

[39]  Simon R. White,et al.  Fast Approximate Bayesian Computation for discretely observed Markov models using a factorised posterior distribution , 2013, 1301.2975.

[40]  Dennis Prangle,et al.  Summary statistics and sequential methods for approximate Bayesian computation , 2011 .

[41]  W. Kendall,et al.  Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes , 2000, Advances in Applied Probability.

[42]  S. P. Blythe,et al.  Nicholson's blowflies revisited , 1980, Nature.

[43]  J. Møller,et al.  An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants , 2006 .

[44]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[45]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Sandro Bottaro,et al.  Potentials of Mean Force for Protein Structure Prediction Vindicated, Formalized and Generalized , 2010, PloS one.

[47]  Anthony N. Pettitt,et al.  Likelihood-free Bayesian estimation of multivariate quantile distributions , 2011, Comput. Stat. Data Anal..

[48]  David Wooff,et al.  Bayes Linear Statistics , 2007 .

[49]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[50]  W. A. Light Techniques for generating approximations via convolution kernels , 2005, Numerical Algorithms.

[51]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[52]  Christian P. Robert,et al.  Model choice versus model criticism , 2009, Proceedings of the National Academy of Sciences.

[53]  M. Przeworski,et al.  A new approach to estimate parameters of speciation models with application to apes. , 2007, Genome research.

[54]  George Ch. Pflug,et al.  Optimization of Stochastic Models , 1996 .

[55]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[56]  P. Diggle,et al.  Monte Carlo Methods of Inference for Implicit Statistical Models , 1984 .

[57]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[58]  D. M. Titterington,et al.  Joint discriminative-generative modelling based on statistical tests for classification , 2010, Pattern Recognit. Lett..

[59]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[60]  Sumeetpal S. Singh,et al.  A backward particle interpretation of Feynman-Kac formulae , 2009, 0908.2556.

[61]  Student Probable Error of a Correlation Coefficient , 1908 .

[62]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[63]  Andrew R. Francis,et al.  Using Approximate Bayesian Computation to Estimate Tuberculosis Transmission Parameters From Genotype Data , 2006, Genetics.

[64]  C C Drovandi,et al.  Estimation of Parameters for Macroparasite Population Evolution Using Approximate Bayesian Computation , 2011, Biometrics.

[65]  A. Pettitt,et al.  Approximate Bayesian computation using indirect inference , 2011 .

[66]  Mark A. Girolami,et al.  Predictive response-relevant clustering of expression data provides insights into disease processes , 2010, Nucleic acids research.

[67]  P. Fearnhead,et al.  Exact and computationally efficient likelihood‐based estimation for discretely observed diffusion processes (with discussion) , 2006 .

[68]  Christian P Robert,et al.  Lack of confidence in approximate Bayesian computation model choice , 2011, Proceedings of the National Academy of Sciences.

[69]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[70]  G. Peters,et al.  A Note on Target Distribution Ambiguity for Likelihood-Free Samplers (ABC) , 2010 .

[71]  Peter Neal,et al.  Efficient likelihood-free Bayesian Computation for household epidemics , 2012, Stat. Comput..

[72]  P. Garthwaite An Interpretation of Partial Least Squares , 1994 .

[73]  Daniel J. Wilson,et al.  Rapid Evolution and the Importance of Recombination to the Gastroenteric Pathogen Campylobacter jejuni , 2008, Molecular biology and evolution.

[74]  David B. Witonsky,et al.  Using Environmental Correlations to Identify Loci Underlying Local Adaptation , 2010, Genetics.

[75]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[76]  Donald B. Rubin,et al.  Validation of Software for Bayesian Models Using Posterior Quantiles , 2006 .

[77]  Gareth W. Peters,et al.  On sequential Monte Carlo, partial rejection control and approximate Bayesian computation , 2008, Statistics and Computing.

[78]  Carsten Wiuf,et al.  Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum , 2007, PLoS Comput. Biol..

[79]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[80]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[81]  Michael A. West,et al.  Combined Parameter and State Estimation in Simulation-Based Filtering , 2001, Sequential Monte Carlo Methods in Practice.

[82]  David Allingham,et al.  Bayesian estimation of quantile distributions , 2009, Stat. Comput..

[83]  Erika Cule,et al.  ABC-SysBio—approximate Bayesian computation in Python with GPU support , 2010, Bioinform..

[84]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[85]  M. Beaumont Estimation of population growth or decline in genetically monitored populations. , 2003, Genetics.

[86]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[87]  Darren J. Wilkinson,et al.  Bayesian inference for a discretely observed stochastic kinetic model , 2008, Stat. Comput..

[88]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[89]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.