Detecting conflicting summary statistics in likelihood-free inference

Bayesian likelihood-free methods implement Bayesian inference using simulation of data from the model to substitute for intractable likelihood evaluations. Most likelihood-free inference methods replace the full data set with a summary statistic before performing Bayesian inference, and the choice of this statistic is often difficult. The summary statistic should be low-dimensional for computational reasons, while retaining as much information as possible about the parameter. Using a recent idea from the interpretable machine learning literature, we develop some regression-based diagnostic methods which are useful for detecting when different parts of a summary statistic vector contain conflicting information about the model parameters. Conflicts of this kind complicate summary statistic choice, and detecting them can be insightful about model deficiencies and guide model improvement. The diagnostic methods developed are based on regression approaches to likelihood-free inference, in which the regression model estimates the posterior density using summary statistics as features. Deletion and imputation of part of the summary statistic vector within the regression model can remove conflicts and approximate posterior distributions for summary statistic subsets. A larger than expected change in the estimated posterior density following deletion and imputation can indicate a conflict in which inferences of interest are affected. The usefulness of the new methods is demonstrated in a number of real examples.

[1]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[2]  D. J. Nott,et al.  Approximate Bayesian computation via regression density estimation , 2012, 1212.1479.

[3]  Stuart Coles,et al.  The Largest Inclusions in a Piece of Steel , 2002 .

[4]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[5]  S. Coles,et al.  Inference for Stereological Extremes , 2007 .

[6]  Iain Murray,et al.  Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , 2018, AISTATS.

[7]  Thomas Bartz-Beielstein,et al.  imputeTS: Time Series Missing Value Imputation in R , 2017, R J..

[8]  Howell Tong,et al.  Threshold Models in Time Series Analysis-30 Years On , 2011 .

[9]  S. A. Sisson,et al.  Overview of Approximate Bayesian Computation , 2018, 1802.09720.

[10]  S. P. Blythe,et al.  Nicholson's blowflies revisited , 1980, Nature.

[11]  Matteo Fasiolo,et al.  A comparison of inferential methods for highly nonlinear state space models in ecology and epidemiology , 2014 .

[12]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[13]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[14]  A. Rényi On Measures of Entropy and Information , 1961 .

[15]  Christian P. Robert,et al.  Model misspecification in approximate Bayesian computation: consequences and diagnostics , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[16]  Christophe Andrieu,et al.  Model criticism based on likelihood-free inference, with an application to protein network evolution , 2009, Proceedings of the National Academy of Sciences.

[17]  David J. Nott,et al.  Marginally-calibrated deep distributional regression , 2019 .

[18]  Scott A. Sisson,et al.  Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model , 2015, 1504.04093.

[19]  David T. Frazier,et al.  Robust Approximate Bayesian Inference With Synthetic Likelihood , 2019, J. Comput. Graph. Stat..

[20]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[21]  Michael U. Gutmann,et al.  Dynamic Likelihood-free Inference via Ratio Estimation (DIRE) , 2018, ArXiv.

[22]  David J. Spiegelhalter,et al.  Conflict Diagnostics in Directed Acyclic Graphs, with Applications in Bayesian Evidence Synthesis , 2013, 1310.0628.

[23]  Cong Ma,et al.  A Selective Overview of Deep Learning , 2019, Statistical science : a review journal of the Institute of Mathematical Statistics.

[24]  Rafael Izbicki,et al.  Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation , 2017, 1704.08095.

[25]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[26]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[27]  David T. Frazier,et al.  Robust Approximate Bayesian Computation: An Adjustment Approach , 2020, 2008.04099.

[28]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[29]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[30]  Ann B. Lee,et al.  ABC–CDE: Toward Approximate Bayesian Computation With Complex High-Dimensional Data and Limited Simulations , 2018, Journal of Computational and Graphical Statistics.

[31]  James Ridgway,et al.  Probably approximate Bayesian computation: nonasymptotic convergence of ABC under misspecification , 2017, ArXiv.

[32]  Vadim Sokolov,et al.  Deep Learning: A Bayesian Perspective , 2017, ArXiv.

[33]  Jukka Corander,et al.  Likelihood-Free Inference by Ratio Estimation , 2016, Bayesian Analysis.

[34]  Philipp Probst,et al.  Hyperparameters and tuning strategies for random forest , 2018, WIREs Data Mining Knowl. Discov..

[35]  A. Nicholson An outline of the dynamics of animal populations. , 1954 .

[36]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[37]  Pier Giovanni Bissiri,et al.  A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[38]  Anthony N. Pettitt,et al.  Bayesian indirect inference using a parametric auxiliary model , 2015, 1505.03372.

[39]  Yanan Fan,et al.  Handbook of Approximate Bayesian Computation , 2018 .

[40]  Scott A. Sisson,et al.  Modelling extremes using approximate Bayesian Computation , 2014, 1411.1451.

[41]  Paul Fearnhead,et al.  Convergence of regression‐adjusted approximate Bayesian computation , 2016, 1609.07135.

[42]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[43]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[44]  Michael Evans,et al.  Checking for prior-data conflict , 2006 .

[45]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[46]  Michael Evans,et al.  Measuring statistical evidence using relative belief , 2015, Computational and structural biotechnology journal.

[47]  Andrew Gelman Bayesian Checking of the Second Levels of Hierarchical Models. Comment.. , 2007 .

[48]  David T. Frazier,et al.  Bayesian Synthetic Likelihood , 2017, 2305.05120.

[49]  Samuel Kaski,et al.  Split-BOLFI for for misspecification-robust likelihood free inference in high dimensions , 2020 .

[50]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[51]  Jean-Michel Marin,et al.  ABC random forests for Bayesian parameter inference , 2019, Bioinform..

[52]  Sylvia Richardson,et al.  Monte Carlo algorithms for model assessment via conflicting summaries , 2011, 1106.5919.

[53]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[54]  Michael Evans,et al.  Checking for Prior-Data Conflict Using Prior-to-Posterior Divergences , 2016, Statistical Science.

[55]  W. Ricker Stock and Recruitment , 1954 .