Improved Contrastive Divergence Training of Energy Based Models

We propose several different techniques to improve contrastive divergence training of energy-based models (EBMs). We first show that a gradient term neglected in the popular contrastive divergence formulation is both tractable to estimate and is important to avoid training instabilities in previous models. We further highlight how data augmentation, multi-scale processing, and reservoir sampling can be used to improve model robustness and generation quality. Thirdly, we empirically evaluate stability of model architectures and show improved performance on a host of benchmarks and use cases, such as image generation, OOD detection, and compositional generation.

[1]  A. Tsybakov,et al.  Root-N consistent estimators of entropy for densities with unbounded support , 1994, Proceedings of 1994 Workshop on Information Theory and Statistics.

[2]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[3]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[4]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[5]  Max Welling Donald,et al.  Products of Experts , 2007 .

[6]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[9]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[10]  Siwei Lyu,et al.  Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction , 2011, NIPS.

[11]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[12]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[13]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[16]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[17]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[18]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[19]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[20]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[21]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[22]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[23]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[24]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[25]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[26]  Dilin Wang,et al.  Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE , 2017, ArXiv.

[27]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[28]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[29]  Song-Chun Zhu,et al.  Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Zhuowen Tu,et al.  Wasserstein Introspective Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Kevin Murphy,et al.  Generative Models of Visually Grounded Imagination , 2017, ICLR.

[32]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[33]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[34]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[35]  Rémi Munos,et al.  Autoregressive Quantile Networks for Generative Modeling , 2018, ICML.

[36]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[37]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[38]  Shuo Yang,et al.  Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling , 2018, ArXiv.

[39]  Yang Lu,et al.  Cooperative Learning of Energy-Based Model and Latent Variable Model via MCMC Teaching , 2018, AAAI.

[40]  Song-Chun Zhu,et al.  Learning Descriptor Networks for 3D Shape Synthesis and Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Yang Lu,et al.  Learning Generative ConvNets via Multi-grid Modeling and Sampling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[43]  Bernhard Schölkopf,et al.  Deep Energy Estimator Networks , 2018, ArXiv.

[44]  Zhijian Ou,et al.  Learning Neural Random Fields with Inclusive Auxiliary Generators , 2018, ArXiv.

[45]  Xiaohua Zhai,et al.  Self-Supervised GANs via Auxiliary Rotation Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[47]  Igor Mordatch,et al.  Model Based Planning with Energy Based Models , 2019, CoRL.

[48]  Debora S. Marks,et al.  Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[49]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[50]  Erik Nijkamp,et al.  Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model , 2019, NeurIPS.

[51]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[52]  Francisco J. R. Ruiz,et al.  A Contrastive Divergence for Combining Variational Inference and MCMC , 2019, ICML.

[53]  Le Song,et al.  Exponential Family Estimation via Adversarial Dynamics Embedding , 2019, NeurIPS.

[54]  Tian Han,et al.  Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[56]  Lantao Yu,et al.  Training Deep Energy-Based Models with f-Divergence Minimization , 2020, ICML.

[57]  Arash Vahdat,et al.  Undirected Graphical Models as Approximate Posteriors , 2019, ICML.

[58]  Shuang Li,et al.  Compositional Visual Generation with Energy Based Models , 2020, NeurIPS.

[59]  Andrew M. Dai,et al.  Flow Contrastive Estimation of Energy-Based Models , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Rob Fergus,et al.  Energy-based models for atomic-resolution protein conformations , 2020, ICLR.

[61]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[62]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[63]  Jack W. Rae,et al.  Meta-Learning Deep Energy-Based Memory Models , 2019, ICLR.

[64]  R. Zemel,et al.  Cutting out the Middle-Man: Training and Evaluating Energy-Based Models without Sampling , 2020, ICML 2020.

[65]  Myle Ott,et al.  Residual Energy-Based Models for Text Generation , 2020, ICLR.

[66]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[67]  Energy-Based Models for Continual Learning , 2020, ArXiv.

[68]  Yoshua Bengio,et al.  Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling , 2020, NeurIPS.

[69]  Tian Han,et al.  On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[70]  A. Gretton,et al.  Generalized Energy Based Models , 2020, ICLR.

[71]  D. Duvenaud,et al.  No MCMC for me: Amortized sampling for fast and stable training of energy-based models , 2020, ICLR.