AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

Entropy is ubiquitous in machine learning, but it is in general intractable to compute the entropy of the distribution of an arbitrary continuous random variable. In this paper, we propose the amortized residual denoising autoencoder (AR-DAE) to approximate the gradient of the log density function, which can be used to estimate the gradient of entropy. Amortization allows us to significantly reduce the error of the gradient approximator by approaching asymptotic optimality of a regular DAE, in which case the estimation is in theory unbiased. We conduct theoretical and experimental analyses on the approximation error of the proposed method, as well as extensive studies on heuristics to ensure its robustness. Finally, using the proposed gradient approximator to estimate the gradient of entropy, we demonstrate state-of-the-art performance on density estimation with variational autoencoders and continuous control with soft actor-critic.

[1]  Charles J. Geyer,et al.  Reweighting Monte Carlo Mixtures , 1991 .

[2]  Abhishek Kumar,et al.  Regularized Autoencoders via Relaxed Injective Probability Flow , 2020, AISTATS.

[3]  Colin Raffel,et al.  Is Generator Conditioning Causally Related to GAN Performance? , 2018, ICML.

[4]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[5]  Jun Zhu,et al.  A Spectral Approach to Gradient Estimation for Implicit Distributions , 2018, ICML.

[6]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[7]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[8]  David Barber,et al.  An Auxiliary Variational Method , 2004, ICONIP.

[9]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[10]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[11]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[12]  David M. Blei,et al.  Prescribed Generative Adversarial Networks , 2019, ArXiv.

[13]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[14]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[15]  U. V. Luxburg,et al.  Improving Variational Autoencoders with Inverse Autoregressive Flow , 2016 .

[16]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[17]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[18]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[19]  Dustin Tran,et al.  Hierarchical Implicit Models and Likelihood-Free Variational Inference , 2017, NIPS.

[20]  Ohad Shamir,et al.  Failures of Gradient-Based Deep Learning , 2017, ICML.

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[25]  R Devon Hjelm,et al.  Leveraging exploration in off-policy algorithms via normalizing flows , 2019, CoRL.

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[28]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[29]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[30]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[31]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[32]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[33]  Aapo Hyvärinen,et al.  Neural Empirical Bayes , 2019, J. Mach. Learn. Res..

[34]  Qiang Liu,et al.  Approximate Inference with Amortised MCMC , 2017, ArXiv.

[35]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[36]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[37]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[38]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[39]  John P. Cunningham,et al.  Maximum Entropy Flow Networks , 2017, ICLR.

[40]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[41]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[42]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[43]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[44]  Ferenc Huszár,et al.  Variational Inference using Implicit Distributions , 2017, ArXiv.

[45]  Richard E. Turner,et al.  Gradient Estimators for Implicit Models , 2017, ICLR.

[46]  Bernhard Schölkopf,et al.  Deep Energy Estimator Networks , 2018, ArXiv.

[47]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[48]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[49]  Matthias Zwicker,et al.  Learning Generative Models using Denoising Density Estimators , 2020, ArXiv.

[50]  R. Durrett Probability: Theory and Examples , 1993 .

[51]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[52]  Zenglin Xu,et al.  Mutual Information Gradient Estimation for Representation Learning , 2020, ICLR.

[53]  Rama Chellappa,et al.  Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs , 2018, ICML.