Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations

Complex phenomena in engineering and the sciences are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to estimate an approximate likelihood or fit a fast emulator model for efficient statistical inference; such surrogate models include Gaussian synthetic likelihoods and more recently neural density estimators such as autoregressive models and normalizing flows. To date, however, there is no consistent way of quantifying the quality of such a fit. Here we propose a statistical framework that can distinguish any arbitrary misspecified model from the target likelihood, and that in addition can identify with statistical confidence the regions of parameter as well as feature space where the fit is inadequate. Our validation method applies to settings where simulations are extremely costly and generated in batches or "ensembles" at fixed locations in parameter space. At the heart of our approach is a two-sample test that quantifies the quality of the fit at fixed parameter values, and a global test that assesses goodness-of-fit across simulation parameters. While our general framework can incorporate any test statistic or distance metric, we specifically argue for a new two-sample test that can leverage any regression method to attain high power and provide diagnostics in complex data settings.

[1]  A. B. Lee,et al.  Local two-sample testing: a new tool for analysing high-dimensional astronomical data , 2017, 1707.04592.

[2]  D. J. Nott,et al.  Approximate Bayesian computation via regression density estimation , 2012, 1212.1479.

[3]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[4]  P. Schneider,et al.  KiDS-450: cosmological parameter constraints from tomographic weak gravitational lensing , 2016, 1606.05338.

[5]  Tom Charnock,et al.  Fast likelihood-free cosmology with neural density estimators and active learning , 2019, Monthly Notices of the Royal Astronomical Society.

[6]  Richard Wilkinson,et al.  Accelerating ABC methods using Gaussian processes , 2014, AISTATS.

[7]  M. Wand,et al.  Multivariate plug-in bandwidth selection , 1994 .

[8]  Chieh-An Lin,et al.  A New Model to Predict Weak Lensing Peak Counts , 2014, Proceedings of the International Astronomical Union.

[9]  Ludovic van Waerbeke,et al.  Simulations of weak gravitational lensing – II. Including finite support effects in cosmic shear covariance matrices , 2014, 1406.0543.

[10]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[11]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[12]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[13]  Jakob H. Macke,et al.  Flexible statistical inference for mechanistic models of neural dynamics , 2017, NIPS.

[14]  K.,et al.  The Community Earth System Model (CESM) large ensemble project: a community resource for studying climate change in the presence of internal climate variability , 2015 .

[15]  Christopher C. Drovandi,et al.  Variational Bayes with synthetic likelihood , 2016, Statistics and Computing.

[16]  Iain Murray,et al.  Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , 2018, AISTATS.

[17]  Frank D. Wood,et al.  Inference Compilation and Universal Probabilistic Programming , 2016, AISTATS.

[18]  C. B. D'Andrea,et al.  Cosmology constraints from shear peak statistics in Dark Energy Survey Science Verification data , 2016, 1603.05040.

[19]  Douglas W. Nychka,et al.  A new ensemble-based consistency test for the Community Earth System Model , 2015 .

[20]  Ann B. Lee,et al.  ABC–CDE: Toward Approximate Bayesian Computation With Complex High-Dimensional Data and Limited Simulations , 2018, Journal of Computational and Graphical Statistics.

[21]  C. J. Conselice,et al.  New image statistics for detecting disturbed galaxy morphologies at high redshift , 2013, 1306.1238.

[22]  Xin Tong,et al.  A plug-in approach to neyman-pearson classification , 2013, J. Mach. Learn. Res..

[23]  Bernhard Schölkopf,et al.  Informative Features for Model Comparison , 2018, NeurIPS.

[24]  Takafumi Kanamori,et al.  $f$ -Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models , 2010, IEEE Transactions on Information Theory.

[25]  Aki Vehtari,et al.  Validating Bayesian Inference Algorithms with Simulation-Based Calibration , 2018, 1804.06788.

[26]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Gilles Louppe,et al.  Mining gold from implicit models to improve likelihood-free inference , 2018, Proceedings of the National Academy of Sciences.

[28]  Olivier Thas,et al.  Comparing Distributions , 2009 .

[29]  Arthur Gretton,et al.  Interpretable Distribution Features with Maximum Testing Power , 2016, NIPS.

[30]  S. Ravindranath,et al.  CANDELS: THE COSMIC ASSEMBLY NEAR-INFRARED DEEP EXTRAGALACTIC LEGACY SURVEY—THE HUBBLE SPACE TELESCOPE OBSERVATIONS, IMAGING DATA PRODUCTS, AND MOSAICS , 2011, 1105.3753.

[31]  Masanori Sato,et al.  SIMULATIONS OF WIDE-FIELD WEAK-LENSING SURVEYS. II. COVARIANCE MATRIX OF REAL-SPACE CORRELATION FUNCTIONS , 2010, 1009.2558.

[32]  Jakob H. Macke,et al.  Likelihood-free inference with emulator networks , 2018, AABI.

[33]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[34]  Ritabrata Dutta,et al.  Likelihood-free inference via classification , 2014, Stat. Comput..

[35]  W. Collins,et al.  The Community Earth System Model: A Framework for Collaborative Research , 2013 .

[36]  David T. Frazier,et al.  Bayesian Synthetic Likelihood , 2017, 2305.05120.

[37]  Barnabás Póczos,et al.  Enabling Dark Energy Science with Deep Generative Models of Galaxy Images , 2016, AAAI.

[38]  Yanan Fan,et al.  Handbook of Approximate Bayesian Computation , 2018 .

[39]  Hugo Larochelle,et al.  Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[40]  Gilles Louppe,et al.  Approximating Likelihood Ratios with Calibrated Discriminative Classifiers , 2015, 1506.02169.

[41]  Benjamin Dan Wandelt,et al.  Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology , 2018, 1801.01497.

[42]  Aki Vehtari,et al.  Gaussian process modelling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria , 2016, The Annals of Applied Statistics.

[43]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[44]  Max Welling,et al.  GPS-ABC: Gaussian Process Surrogate Approximate Bayesian Computation , 2014, UAI.

[45]  J. P. Dietrich,et al.  Cosmology with the shear-peak statistics , 2009, 0906.3512.

[46]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[47]  Arthur Gretton,et al.  Fast Two-Sample Testing with Analytic Representations of Probability Measures , 2015, NIPS.

[48]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[49]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[50]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[51]  Michael U. Gutmann,et al.  Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models , 2015, J. Mach. Learn. Res..

[52]  Z. Bai,et al.  A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices , 2016, 1603.01003.

[53]  Michael U. Gutmann,et al.  Dynamic Likelihood-free Inference via Ratio Estimation (DIRE) , 2018, ArXiv.

[54]  Larry A. Wasserman,et al.  Classification Accuracy as a Proxy for Two Sample Testing , 2016, The Annals of Statistics.

[55]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[56]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[57]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[58]  Gilles Louppe,et al.  Constraining Effective Field Theories with Machine Learning. , 2018, Physical review letters.

[59]  Gilles Louppe,et al.  Likelihood-free inference with an improved cross-entropy estimator , 2018, ArXiv.

[60]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[61]  S. Sisson,et al.  Diagnostic tools for approximate Bayesian computation using the coverage property , 2013, 1301.3166.

[62]  Takafumi Kanamori,et al.  Least-squares two-sample test , 2011, Neural Networks.

[63]  Rafael Izbicki,et al.  High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation , 2014, AISTATS.

[64]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[65]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[66]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[67]  Matthias Bethge,et al.  Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.

[68]  Jukka Corander,et al.  Likelihood-Free Inference by Ratio Estimation , 2016, Bayesian Analysis.

[69]  Ann B. Lee,et al.  Global and local two-sample tests via regression , 2018, Electronic Journal of Statistics.

[70]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[71]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[72]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[73]  Donald B. Rubin,et al.  Validation of Software for Bayesian Models Using Posterior Quantiles , 2006 .

[74]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[75]  Deborah Bard,et al.  CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks , 2017, Computational Astrophysics and Cosmology.

[76]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[77]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[78]  Daniel J. Hsu,et al.  Non-Gaussian information from weak lensing data via deep learning , 2018, ArXiv.

[79]  C. B. D'Andrea,et al.  Cosmology from cosmic shear with Dark Energy Survey science verification data , 2015, 1507.05552.

[80]  Scott A. Sisson,et al.  Does Amazonian deforestation cause global effects; can we be sure? , 2016 .

[81]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[82]  J. Friedman On Multivariate Goodness-of-Fit and Two-Sample Testing , 2004 .

[83]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[84]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[85]  David S. Greenberg,et al.  Automatic Posterior Transformation for Likelihood-Free Inference , 2019, ICML.

[86]  David Lopez-Paz,et al.  Revisiting Classifier Two-Sample Tests , 2016, ICLR.