Likelihood-Free Inference by Ratio Estimation

We consider the problem of parametric statistical inference when likelihood computations are prohibitively expensive but sampling from the model is possible. Several so-called likelihood-free methods have been developed to perform inference in the absence of a likelihood function. The popular synthetic likelihood approach infers the parameters by modelling summary statistics of the data by a Gaussian probability distribution. In another popular approach called approximate Bayesian computation, the inference is performed by identifying parameter values for which the summary statistics of the simulated data are close to those of the observed data. Synthetic likelihood is easier to use as no measure of `closeness' is required but the Gaussianity assumption is often limiting. Moreover, both approaches require judiciously chosen summary statistics. We here present an alternative inference approach that is as easy to use as synthetic likelihood but not as restricted in its assumptions, and that, in a natural way, enables automatic selection of relevant summary statistic from a large set of candidates. The basic idea is to frame the problem of estimating the posterior as a problem of estimating the ratio between the data generating distribution and the marginal distribution. This problem can be solved by logistic regression, and including regularising penalty terms enables automatic selection of the summary statistics relevant to the inference task. We illustrate the general theory on canonical examples and employ it to perform inference for challenging stochastic nonlinear dynamical systems and high-dimensional summary statistics.

[1]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[2]  Iain Murray,et al.  On Contrastive Learning for Likelihood-free Inference , 2020, ICML.

[3]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[4]  HyvärinenAapo,et al.  Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics , 2012 .

[5]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[6]  Gilles Louppe,et al.  Approximating Likelihood Ratios with Calibrated Discriminative Classifiers , 2015, 1506.02169.

[7]  A. Gourdin,et al.  Applied Numerical Methods , 2004 .

[8]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[9]  Iain Murray,et al.  Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , 2018, AISTATS.

[10]  S. Geer,et al.  The Lasso, correlated design, and improved oracle inequalities , 2011, 1107.0189.

[11]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[12]  Andreas Huth,et al.  Statistical inference for stochastic simulation models--theory and application. , 2011, Ecology letters.

[13]  Luc Lens,et al.  Assessing the dynamics of natural populations by fitting individual‐based models with approximate Bayesian computation , 2018 .

[14]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[15]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[16]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[17]  Andrew R. Francis,et al.  Using Approximate Bayesian Computation to Estimate Tuberculosis Transmission Parameters From Genotype Data , 2006, Genetics.

[18]  Gilles Louppe,et al.  Likelihood-free MCMC with Amortized Approximate Ratio Estimators , 2019, ICML.

[19]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[20]  C. Geyer Estimating Normalizing Constants and Reweighting Mixtures , 1994 .

[21]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[22]  Richard Wilkinson,et al.  Accelerating ABC methods using Gaussian processes , 2014, AISTATS.

[23]  C. Chu,et al.  Semiparametric density estimation under a two-sample density ratio model , 2004 .

[24]  David S. Greenberg,et al.  Automatic Posterior Transformation for Likelihood-Free Inference , 2019, ICML.

[25]  M. Gutmann,et al.  Fundamentals and Recent Developments in Approximate Bayesian Computation , 2016, Systematic biology.

[26]  Jean-Michel Marin,et al.  ABC random forests for Bayesian parameter inference , 2019, Bioinform..

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  Paul Fearnhead,et al.  Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation , 2012 .

[29]  R. Plevin,et al.  Approximate Bayesian Computation in Evolution and Ecology , 2011 .

[30]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[31]  Michael U. Gutmann,et al.  Adaptive Gaussian Copula ABC , 2019, AISTATS.

[32]  Y. Qin Inferences for case-control and semiparametric two-sample density ratio models , 1998 .

[33]  David J. Nott,et al.  A note on approximating ABC‐MCMC using flexible classifiers , 2014 .

[34]  Alexander Ilin,et al.  On closure parameter estimation in chaotic systems , 2012 .

[35]  W. Ricker Stock and Recruitment , 1954 .

[36]  Ritabrata Dutta,et al.  Likelihood-free inference via classification , 2014, Stat. Comput..

[37]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016 .

[38]  M. Gutmann,et al.  Weak Epistasis May Drive Adaptation in Recombining Bacteria , 2017, Genetics.

[39]  E. Lorenz Predictability of Weather and Climate: Predictability – a problem partly solved , 2006 .

[40]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[41]  Aki Vehtari,et al.  Efficient Acquisition Rules for Model-Based Approximate Bayesian Computation , 2017, Bayesian Analysis.

[42]  M. Gutmann,et al.  Frequency-dependent selection in vaccine-associated pneumococcal population dynamics , 2017, Nature Ecology & Evolution.

[43]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[44]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[45]  S. White,et al.  The EAGLE project: Simulating the evolution and assembly of galaxies and their environments , 2014, 1407.7040.

[46]  A. Futschik,et al.  A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation , 2012, Genetics.

[47]  Max Welling,et al.  GPS-ABC: Gaussian Process Surrogate Approximate Bayesian Computation , 2014, UAI.

[48]  Daniel Wegmann,et al.  Bayesian Computation and Model Selection Without Likelihoods , 2010, Genetics.

[49]  Samuel Kaski,et al.  Approximate Bayesian Computation via Population Monte Carlo and Classification , 2018, ArXiv.

[50]  D. J. Nott,et al.  Approximate Bayesian computation via regression density estimation , 2012, 1212.1479.

[51]  Thomas E. Currie,et al.  War, space, and the evolution of Old World complex societies , 2013, Proceedings of the National Academy of Sciences.

[52]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[53]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[54]  Michael U. Gutmann,et al.  Dynamic Likelihood-free Inference via Ratio Estimation (DIRE) , 2018, ArXiv.

[55]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[56]  Aapo Hyvärinen,et al.  A Family of Computationally E cient and Simple Estimators for Unnormalized Statistical Models , 2010, UAI.

[57]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[58]  Michael U. Gutmann,et al.  Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models , 2015, J. Mach. Learn. Res..

[59]  Christopher C. Drovandi,et al.  Accelerating Bayesian Synthetic Likelihood With the Graphical Lasso , 2019, Journal of Computational and Graphical Statistics.

[60]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[61]  D. Wilks Effects of stochastic parametrizations in the Lorenz '96 system , 2005 .

[62]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[63]  Christopher C. Drovandi,et al.  Likelihood-free inference in high dimensions with synthetic likelihood , 2018, Comput. Stat. Data Anal..

[64]  J. Ghosh,et al.  Model Selection and Multiple Testing - A Bayesian and Empirical Bayes Overview and some New Results , 2015, 1510.00547.

[65]  J. Marin,et al.  Population Monte Carlo , 2004 .

[66]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[67]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[68]  Rafael Izbicki,et al.  High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation , 2014, AISTATS.

[69]  Junichiro Hirayama,et al.  Bregman divergence as general framework to estimate unnormalized statistical models , 2011, UAI.

[70]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[71]  Jukka Corander,et al.  PYLFIRE: Python implementation of likelihood-free inference by ratio estimation , 2019, Wellcome open research.

[72]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[73]  T. Monz,et al.  Real-time dynamics of lattice gauge theories with a few-qubit quantum computer , 2016, Nature.

[74]  David T. Frazier,et al.  Bayesian Synthetic Likelihood , 2017, 2305.05120.

[75]  Bai Jiang,et al.  Learning Summary Statistic for Approximate Bayesian Computation via Deep Neural Network , 2015, 1510.02175.

[76]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.