Approximate Bayesian Computation Via the Energy Statistic

Approximate Bayesian computation (ABC) has become an essential part of the Bayesian toolbox for addressing problems in which the likelihood is prohibitively expensive or entirely unknown, making it intractable. ABC defines a pseudo-posterior by comparing observed data with simulated data, traditionally based on some summary statistics, the elicitation of which is regarded as a key difficulty. Recently, using data discrepancy measures has been proposed in order to bypass the construction of summary statistics. Here we propose to use the importance-sampling ABC (IS-ABC) algorithm relying on the so-called two-sample energy statistic. We establish a new asymptotic result for the case where both the observed sample size and the simulated data sample size increase to infinity, which highlights to what extent the data discrepancy measure impacts the asymptotic pseudo-posterior. The result holds in the broad setting of IS-ABC methodologies, thus generalizing previous results that have been established only for rejection ABC algorithms. Furthermore, we propose a consistent V-statistic estimator of the energy statistic, under which we show that the large sample result holds, and prove that the rejection ABC algorithm, based on the energy statistic, generates pseudo-posterior distributions that achieves convergence to the correct limits, when implemented with rejection thresholds that converge to zero, in the finite sample setting. Our proposed energy statistic based ABC algorithm is demonstrated on a variety of models, including a Gaussian mixture, a moving-average model of order two, a bivariate beta and a multivariate $g$ -and- $k$ distribution. We find that our proposed method compares well with alternative discrepancy measures.

[1]  Gary Koop,et al.  Bayesian Econometric Methods , 2007 .

[2]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[3]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[4]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[5]  Peter Neal Approximate Bayesian Computation Methods for Epidemic Models , 2019 .

[6]  Simon Tavaré On the History of ABC , 2018 .

[7]  James M. Flegal,et al.  Bayesian inference for a flexible class of bivariate beta distributions , 2014 .

[8]  C. Dellacherie,et al.  Probabilities and Potential B: Theory of Martingales , 2012 .

[9]  Yanan Fan,et al.  Handbook of Approximate Bayesian Computation , 2018 .

[10]  Paul Fearnhead,et al.  On the Asymptotic Efficiency of Approximate Bayesian Computation Estimators , 2015, 1506.03481.

[11]  P. Müller,et al.  Bayesian Nonparametrics: An invitation to Bayesian nonparametrics , 2010 .

[12]  Antoni Zygmund An individual ergodic theorem for non-commutative transformations. , 1951 .

[13]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[14]  Mathieu Gerber,et al.  Approximate Bayesian computation with the Wasserstein distance , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[15]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[16]  Xiaoming Huo,et al.  Fast Computing for Distance Covariance , 2014, Technometrics.

[17]  Fabrizio Leisen,et al.  An Approximate Likelihood Perspective on ABC Methods , 2017, 1708.05341.

[18]  Wittawat Jitkrittum,et al.  K2-ABC: Approximate Bayesian Computation with Kernel Embeddings , 2015, AISTATS.

[19]  Anthony N. Pettitt,et al.  Likelihood-free Bayesian estimation of multivariate quantile distributions , 2011, Comput. Stat. Data Anal..

[20]  Aad van der Vaart,et al.  Fundamentals of Nonparametric Bayesian Inference , 2017 .

[21]  Gábor J. Székely,et al.  The Energy of Data , 2017 .

[22]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[23]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[24]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[25]  P. Sen Almost Sure Convergence of Generalized $U$-Statistics , 1977 .

[26]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[27]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[28]  R. H. Glendinning,et al.  Theory of U‐Statistics , 1995 .

[29]  K. Koch Introduction to Bayesian Statistics , 2007 .

[30]  J. Ghosh,et al.  An Introduction to Bayesian Analysis: Theory and Methods , 2006 .

[31]  M. Gutmann,et al.  Fundamentals and Recent Developments in Approximate Bayesian Computation , 2016, Systematic biology.

[32]  Anirban DasGupta,et al.  Probability for Statistics and Machine Learning: Fundamentals and Advanced Topics , 2011 .

[33]  Ritabrata Dutta,et al.  Likelihood-free inference via classification , 2014, Stat. Comput..

[34]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[35]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[36]  Michel Barlaud,et al.  High-Dimensional Statistical Measure for Region-of-Interest Tracking , 2009, IEEE Transactions on Image Processing.

[37]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[38]  Giovanni Puccetti An Algorithm to Approximate the Optimal Expected Inner Product of Two Vectors with Given Marginals , 2017 .

[39]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[40]  Bai Jiang,et al.  Approximate Bayesian Computation with Kullback-Leibler Divergence as Data Discrepancy , 2018, AISTATS.

[41]  J.-M. Marin,et al.  Relevant statistics for Bayesian model choice , 2011, 1110.4700.

[42]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[43]  Jochen Voss,et al.  An Introduction to Statistical Computing: A Simulation-based Approach , 2013 .

[44]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[45]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[46]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[47]  Arin Chaudhuri,et al.  A fast algorithm for computing distance correlation , 2018, Comput. Stat. Data Anal..

[48]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[49]  S. James Press,et al.  Subjective and objective Bayesian statistics : principles, models, and applications , 2003 .