Learning to Sample Using Stein Discrepancy

We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient [1] that maximumly decreases the KL divergence with the target distribution. Our method works for any target distribution specified by their unnormalized density function, and can train any black-box architectures that are differentiable in terms of the parameters we want to adapt. By allowing to “learn to draw samples”, our method opens a host of applications. We present two examples in this paper: 1) we propose an amortized MLE method for training deep energy model, where a neural sampler is adaptively trained to approximate the likelihood function. Our method mimics an adversarial game between the deep energy model and the neural sampler, and obtains realistic-looking images competitive with the state-of-the-art results. 2) by treating stochastic gradient Langevin dynamics as a black-box sampler, we train it to automatically adjust its learning rate to maximize its convergence speed, and get better performances than the hand-designed learning rate schemes.

[1]  Qiang Liu,et al.  Two Methods for Wild Variational Inference , 2016, 1612.00081.

[2]  Qiang Liu,et al.  Black-box Importance Sampling , 2016, AISTATS.

[3]  Dilin Wang,et al.  Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.

[4]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[5]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[6]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[7]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[8]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[9]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[10]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[11]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[12]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[13]  B. Delyon,et al.  Integral approximation by kernel smoothing , 2014, 1409.0733.

[14]  Michael A. Osborne,et al.  Probabilistic Integration: A Role for Statisticians in Numerical Analysis? , 2015 .

[15]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[16]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[17]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[18]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[21]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[25]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[26]  S. Eguchi,et al.  Importance Sampling Via the Estimated Sampler , 2007 .

[27]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[28]  Anthony O'Hagan,et al.  Monte Carlo is fundamentally unsound , 1987 .