论文信息 - Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms - 字舞流文

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms

Variational inference using the reparameterization trick has enabled large-scale approximate Bayesian inference in complex probabilistic models, leveraging stochastic optimization to sidestep intractable expectations. The reparameterization trick is applicable when we can simulate a random variable by applying a differentiable deterministic function on an auxiliary random variable whose distribution is fixed. For many distributions of interest (such as the gamma or Dirichlet), simulation of random variables relies on acceptance-rejection sampling. The discontinuity introduced by the accept-reject step means that standard reparameterization tricks are not applicable. We propose a new method that lets us leverage reparameterization gradients even when variables are outputs of a acceptance-rejection sampling algorithm. Our approach enables reparameterization on a larger class of variational distributions. In several studies of real and synthetic data, we show that the variance of the estimator of the gradient is significantly lower than other state-of-the-art methods. This leads to faster convergence of stochastic gradient variational inference.

Scott W. Linderman | David M. Blei | C. A. Naesseth | Francisco J. R. Ruiz | Christian A. Naesseth | D. Blei

[1] Andrew Gelman,et al. Automatic Variational Inference in Stan , 2015, NIPS.

[2] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[3] Steve R. Waterhouse,et al. Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5] Robert Price,et al. A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[6] G. Casella,et al. Rao-Blackwellisation of sampling schemes , 1996 .

[7] A. Stuart. Gamma-distributed products of independent random variables , 1962 .

[8] Tim Salimans,et al. Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[9] Scott W. Linderman,et al. Rejection Sampling Variational Inference , 2016 .

[10] Michael I. Jordan,et al. Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[11] George Marsaglia,et al. A simple method for generating gamma variables , 2000, TOMS.

[12] References , 1971 .

[13] G. Bonnet. Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire , 1964 .

[14] Ole Winther,et al. Auxiliary Deep Generative Models , 2016, ICML.

[15] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[16] Miguel Lázaro-Gredilla,et al. Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[17] David M. Blei,et al. Deep Exponential Families , 2014, AISTATS.

[18] David A. Knowles. Stochastic gradient variational Bayes for gamma approximating distributions , 2015, 1509.01631.

[19] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[20] Dustin Tran,et al. Hierarchical Variational Models , 2015, ICML.

[21] James T. Kwok,et al. Fast Second Order Stochastic Backpropagation for Variational Inference , 2015, NIPS.

[22] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[23] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[24] David M. Blei,et al. The Generalized Reparameterization Gradient , 2016, NIPS.

[25] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.

[26] Gareth O. Roberts,et al. Non-centred parameterisations for hierarchical models and data augmentation. , 2003 .

[27] Dustin Tran,et al. Variational Gaussian Process , 2015, ICLR.

[28] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[29] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.

[30] Dustin Tran,et al. Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[31] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[32] M. J. Bayarri,et al. Non-Centered Parameterisations for Hierarchical Models and Data Augmentation , 2003 .

[33] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[34] Max Welling,et al. Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[35] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[36] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[37] L. Devroye. Non-Uniform Random Variate Generation , 1986 .

[38] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.