GO Gradient for Expectation-Based Objectives

Within many machine learning algorithms, a fundamental problem concerns efficient calculation of an unbiased gradient wrt parameters $\gammav$ for expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing methods either (i) suffer from high variance, seeking help from (often) complicated variance-reduction techniques; or (ii) they only apply to reparameterizable continuous random variables and employ a reparameterization trick. To address these limitations, we propose a General and One-sample (GO) gradient that (i) applies to many distributions associated with non-reparameterizable continuous or discrete random variables, and (ii) has the same low-variance as the reparameterization trick. We find that the GO gradient often works well in practice based on only one Monte Carlo sample (although one can of course use more samples if desired). Alongside the GO gradient, we develop a means of propagating the chain rule through distributions, yielding statistical back-propagation, coupling neural networks to common random variables.

[1]  Keith O. Geddes,et al.  Evaluation of classes of definite integrals involving elementary functions via differentiation of special functions , 1990, Applicable Algebra in Engineering, Communication and Computing.

[2]  Miguel Lázaro-Gredilla,et al.  Local Expectation Gradients for Black Box Variational Inference , 2015, NIPS.

[3]  Lawrence Carin,et al.  ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching , 2017, NIPS.

[4]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[5]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[6]  Scott W. Linderman,et al.  Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms , 2016, AISTATS.

[7]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[10]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[11]  Mingyuan Zhou,et al.  Augmentable Gamma Belief Networks , 2016, J. Mach. Learn. Res..

[12]  Yann LeCun,et al.  Energy-based Generative Adversarial Networks , 2016, ICLR.

[13]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  David M. Blei,et al.  Stochastic Structured Variational Inference , 2014, AISTATS.

[16]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[17]  Scott W. Linderman,et al.  Rejection Sampling Variational Inference , 2016 .

[18]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[19]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[20]  M. Titsias Local Expectation Gradients for Doubly Stochastic Variational Inference , 2015, 1503.01494.

[21]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[22]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[23]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[24]  Zhe Gan,et al.  Triangle Generative Adversarial Networks , 2017, NIPS.

[25]  Guoyin Wang,et al.  Learning to Sample with Adversarially Learned Likelihood-Ratio , 2018 .

[26]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  David M. Blei,et al.  Deep Exponential Families , 2014, AISTATS.

[29]  Mingyuan Zhou,et al.  The Poisson Gamma Belief Network , 2015, NIPS.

[30]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[31]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32]  David M. Blei,et al.  The Generalized Reparameterization Gradient , 2016, NIPS.

[33]  Yoshua Bengio,et al.  Boundary Seeking GANs , 2018, ICLR.

[34]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[35]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[36]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[37]  Hao Zhang,et al.  WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling , 2018, ICLR.

[38]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[39]  James T. Kwok,et al.  Fast Second Order Stochastic Backpropagation for Variational Inference , 2015, NIPS.

[40]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[41]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[42]  Martin Jankowiak,et al.  Pathwise Derivatives Beyond the Reparameterization Trick , 2018, ICML.

[43]  Sergey Levine,et al.  MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.

[44]  Hongwei Liu,et al.  Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC , 2017, ICML.

[45]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[46]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[47]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[48]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[49]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[50]  Mingyuan Zhou,et al.  ARM: Augment-REINFORCE-Merge Gradient for Discrete Latent Variable Models , 2018, ArXiv.