Distilling importance sampling

The two main approaches to Bayesian inference are sampling and optimisation methods. However many complicated posteriors are difficult to approximate by either. Therefore we propose a novel approach combining features of both. We use a flexible parameterised family of densities, such as a normalising flow. Given a density from this family approximating the posterior, we use importance sampling to produce a weighted sample from a more accurate posterior approximation. This sample is then used in optimisation to update the parameters of the approximate density, which we view as distilling the importance sampling results. We iterate these steps and gradually improve the quality of the posterior approximation. We illustrate our method in two challenging examples: a queueing model and a stochastic differential equation model.

[1]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[2]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[3]  James M. Flegal,et al.  Multivariate output analysis for Markov chain Monte Carlo , 2015, Biometrika.

[4]  C. A. Naesseth,et al.  Markovian Score Climbing: Variational Inference with KL(p||q) , 2020, NeurIPS.

[5]  Richard G. Everitt,et al.  A rare event approach to high-dimensional approximate Bayesian computation , 2016, Statistics and Computing.

[6]  Amos J. Storkey,et al.  Asymptotically exact inference in differentiable generative models , 2016, AISTATS.

[7]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[8]  A. Stephen McGough,et al.  Black-Box Variational Inference for Stochastic Differential Equations , 2018, ICML.

[9]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[10]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[11]  Dan Cornford,et al.  Variational mean-field algorithm for efficient inference in large systems of stochastic differential equations. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Luca Martino,et al.  Effective sample size for importance sampling based on discrepancy measures , 2016, Signal Process..

[13]  Francisco J. R. Ruiz,et al.  A Contrastive Divergence for Combining Variational Inference and MCMC , 2019, ICML.

[14]  P. Diaconis,et al.  The sample size required in importance sampling , 2015, 1511.01437.

[15]  P. Fearnhead,et al.  Random‐weight particle filtering of continuous time processes , 2010 .

[16]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[17]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[18]  E. Lorenz Deterministic nonperiodic flow , 1963 .

[19]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[20]  J. Rosenthal,et al.  On the efficiency of pseudo-marginal random walk Metropolis algorithms , 2013, The Annals of Statistics.

[21]  M. Sahani,et al.  Counterexamples to variational free energy compactness folk theorems , 2008 .

[22]  Ömer Deniz Akyildiz,et al.  Nudging the particle filter , 2017, Statistics and Computing.

[23]  Dao Nguyen,et al.  Statistical Inference for Partially Observed Markov Processes via the R Package pomp , 2015, 1509.00503.

[24]  Michael I. Jordan,et al.  Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence , 2021, UAI.

[25]  Jean-Marie Cornuet,et al.  Adaptive Multiple Importance Sampling , 2009, 0907.1254.

[26]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[27]  A. Gallant,et al.  Numerical Techniques for Maximum Likelihood Estimation of Continuous-Time Diffusion Processes , 2002 .

[28]  Dustin Tran,et al.  Variational Inference via \chi Upper Bound Minimization , 2016, NIPS.

[29]  Frank D. Wood,et al.  Inference Compilation and Universal Probabilistic Programming , 2016, AISTATS.

[30]  Alan E. Gelfand,et al.  Bayesian statistics without tears: A sampling-resampling perspective , 1992 .

[31]  Prabhat,et al.  Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model , 2018, NeurIPS.

[32]  A. Doucet,et al.  Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator , 2012, 1210.1871.

[33]  Radford M. Neal,et al.  On Bayesian inference for the M/G/1 queue with efficient MCMC sampling , 2014, 1401.5548.

[34]  L. Duan Transport Monte Carlo , 2019, 1907.10448.

[35]  Youssef Marzouk,et al.  Transport Map Accelerated Markov Chain Monte Carlo , 2014, SIAM/ASA J. Uncertain. Quantification.

[36]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[37]  O. Cappé,et al.  Population Monte Carlo , 2004 .

[38]  O. Papaspiliopoulos,et al.  Importance Sampling: Intrinsic Dimension and Computational Cost , 2015, 1511.06196.

[39]  Iain Murray,et al.  Neural Spline Flows , 2019, NeurIPS.

[40]  E. Ionides Truncated Importance Sampling , 2008 .

[41]  Christian P. Robert,et al.  Bayesian computation: a summary of the current state, and samples backwards and forwards , 2015, Statistics and Computing.

[42]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[43]  Richard J. Boys,et al.  Improved bridge constructs for stochastic differential equations , 2015, Statistics and Computing.

[44]  John A. Rogersa,et al.  Correction for ‘ ‘ Sequential Monte Carlo without likelihoods , 2009 .

[45]  Yanan Fan,et al.  A review of approximate Bayesian computation methods via density estimation: Inference for simulator‐models , 2019, WIREs Computational Statistics.

[46]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[47]  Trevor Campbell,et al.  Practical Posterior Error Bounds from Variational Objectives , 2019, ArXiv.

[48]  Michael Figurnov,et al.  Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..

[49]  M. Opper Variational Inference for Stochastic Differential Equations , 2019, Annalen der Physik.

[50]  Thomas Müller,et al.  Neural Importance Sampling , 2018, ACM Trans. Graph..

[51]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[52]  Jun S. Liu,et al.  Metropolized independent sampling with comparisons to rejection sampling and importance sampling , 1996, Stat. Comput..

[53]  Iain Murray,et al.  Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , 2018, AISTATS.

[54]  Michael Riis Andersen,et al.  Challenges and Opportunities in High-dimensional Variational Inference , 2021, NeurIPS.

[55]  Qiang Liu,et al.  Approximate Inference with Amortised MCMC , 2017, ArXiv.

[56]  I. Kevrekidis,et al.  Transport Map Accelerated Adaptive Importance Sampling, and Application to Inverse Problems Arising from Multiscale Stochastic Reaction Networks , 2019, SIAM/ASA J. Uncertain. Quantification.

[57]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[58]  Adji B. Dieng,et al.  Variational Inference via χ Upper Bound Minimization , 2017 .

[59]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[60]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[61]  George Papamakarios,et al.  Neural Density Estimation and Likelihood-free Inference , 2019, ArXiv.

[62]  D. V. Lindley,et al.  The theory of queues with a single server , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.