Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning

We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient (Liu & Wang, 2016) that maximumly decreases the KL divergence with the target distribution. Our method works for any target distribution specified by their unnormalized density function, and can train any black-box architectures that are differentiable in terms of the parameters we want to adapt. As an application of our method, we propose an amortized MLE algorithm for training deep energy model, where a neural sampler is adaptively trained to approximate the likelihood function. Our method mimics an adversarial game between the deep energy model and the neural sampler, and obtains realisticlooking images competitive with the state-of-the-art results.

[1]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[2]  Anthony O'Hagan,et al.  Monte Carlo is fundamentally unsound , 1987 .

[3]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[4]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[5]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[6]  Louis H. Y. Chen,et al.  An Introduction to Stein's Method , 2005 .

[7]  S. Eguchi,et al.  Importance Sampling Via the Estimated Sampler , 2007 .

[8]  G. Evans,et al.  Learning to Optimize , 2008 .

[9]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[10]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[14]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[15]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[16]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[17]  Michael A. Osborne,et al.  Probabilistic Integration: A Role for Statisticians in Numerical Analysis? , 2015 .

[18]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[19]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[20]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[21]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[24]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[25]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[26]  B. Delyon,et al.  Integral approximation by kernel smoothing , 2014, 1409.0733.

[27]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[28]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[29]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[30]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[31]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[32]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[33]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[34]  Frank D. Wood,et al.  Inference Networks for Sequential Monte Carlo in Graphical Models , 2016, ICML.

[35]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[36]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[37]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[38]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[39]  Sebastian Nowozin,et al.  Learning Step Size Controllers for Robust Neural Network Training , 2016, AAAI.

[40]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.