Implicit Reparameterization Gradients

By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not applicable to a number of important continuous distributions. We introduce an alternative approach to computing reparameterization gradients based on implicit differentiation and demonstrate its broader applicability by applying it to Gamma, Beta, Dirichlet, and von Mises distributions, which cannot be used with the classic reparameterization trick. Our experiments show that the proposed approach is faster and more accurate than the existing gradient estimators for these distributions.

[1]  G. Bhattacharjee The Incomplete Gamma Integral , 1970 .

[2]  Geoffrey W. Hill,et al.  Algorithm 518: Incomplete Bessel Function I0. The Von Mises Distribution [S14] , 1977, TOMS.

[3]  N. Fisher,et al.  Efficient Simulation of the von Mises Distribution , 1979 .

[4]  R. J. Moore Algorithm AS 187: Derivatives of the Incomplete Gamma Integral , 1982 .

[5]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[6]  R. Suri,et al.  Perturbation analysis gives strongly consistent sensitivity estimates for the M/G/ 1 queue , 1988 .

[7]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[8]  George Marsaglia,et al.  A simple method for generating gamma variables , 2000, TOMS.

[9]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[10]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[11]  Paul Glasserman,et al.  Monte Carlo Methods in Financial Engineering , 2003 .

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[14]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[15]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[16]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[17]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[18]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[19]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[20]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[21]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[22]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[23]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[24]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[25]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[26]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[27]  L. Rüschendorf Copulas, Sklar’s Theorem, and Distributional Transform , 2013 .

[28]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[29]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[30]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[33]  David A. Knowles Stochastic gradient variational Bayes for gamma approximating distributions , 2015, 1509.01631.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[36]  David M. Blei,et al.  Stochastic Structured Variational Inference , 2014, AISTATS.

[37]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[38]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[39]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[40]  David M. Blei,et al.  The Generalized Reparameterization Gradient , 2016, NIPS.

[41]  Alex Graves,et al.  Stochastic Backpropagation through Mixture Density Distributions , 2016, ArXiv.

[42]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[43]  Lars Hertel,et al.  Approximate Inference for Deep Latent Gaussian Mixtures , 2016 .

[44]  Scott W. Linderman,et al.  Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms , 2016, AISTATS.

[45]  Charles A. Sutton,et al.  Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[46]  Marco Cote STICK-BREAKING VARIATIONAL AUTOENCODERS , 2017 .

[47]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[48]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[49]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[50]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[51]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[52]  Kristopher L. Kuhlman,et al.  mpmath: a Python library for arbitrary-precision floating-point arithmetic , 2017 .

[53]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[54]  Dustin Tran,et al.  TensorFlow Distributions , 2017, ArXiv.

[55]  David Duvenaud,et al.  Sticking the Landing: An Asymptotically Zero-Variance Gradient Estimator for Variational Inference , 2017, ArXiv.

[56]  Charles A. Sutton,et al.  Variational Inference In Pachinko Allocation Machines , 2018, ArXiv.

[57]  Hao Zhang,et al.  WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling , 2018, ICLR.

[58]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[59]  Martin Jankowiak,et al.  Pathwise Derivatives Beyond the Reparameterization Trick , 2018, ICML.

[60]  Theofanis Karaletsos,et al.  Pathwise Derivatives for Multivariate Distributions , 2018, AISTATS.