Learning Deep Generative Models with Short Run Inference Dynamics

This paper studies the fundamental problem of learning deep generative models that consist of one or more layers of latent variables organized in top-down architectures. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables. The inference typically requires Markov chain Monte Caro (MCMC) that can be time consuming. In this paper, we propose to use short run inference dynamics guided by the log-posterior, such as finite-step gradient descent algorithm initialized from the prior distribution of the latent variables, as an approximate sampler of the posterior distribution, where the step size of the gradient descent dynamics is optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run inference dynamics and the posterior distribution. Our experiments show that the proposed method outperforms variational auto-encoder (VAE) in terms of reconstruction error and synthesis quality. The advantage of the proposed method is that it is natural and automatic, even for models with multiple layers of latent variables.

[1]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[2]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[5]  Yang Lu,et al.  Learning Generative ConvNets via Multi-grid Modeling and Sampling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[7]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[8]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[9]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[10]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[11]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[12]  Kumar Krishna Agrawal,et al.  Discrete Flows: Invertible Generative Models of Discrete Data , 2019, DGS@ICLR.

[13]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[14]  Zhuowen Tu,et al.  Introspective Neural Networks for Generative Modeling , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[18]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[19]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[20]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[21]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[22]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[23]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[24]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[25]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[27]  Zhuowen Tu,et al.  Introspective Classification with Convolutional Nets , 2017, NIPS.

[28]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[29]  E. Save,et al.  Attractors in Memory , 2005, Science.

[30]  Yann LeCun,et al.  Energy-based Generative Adversarial Networks , 2016, ICLR.

[31]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[32]  Erik Nijkamp,et al.  Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model , 2019, NeurIPS.

[33]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[34]  H. Robbins A Stochastic Approximation Method , 1951 .

[35]  Kevin Gimpel,et al.  Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[36]  Song-Chun Zhu,et al.  Learning Dynamic Generator Model by Alternating Back-Propagation Through Time , 2018, AAAI.

[37]  Daniel J. Amit,et al.  Modeling brain function: the world of attractor neural networks, 1st Edition , 1989 .

[38]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[39]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[40]  Tian Han,et al.  Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[42]  Tian Han,et al.  Alternating Back-Propagation for Generator Network , 2016, AAAI.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.