Compositional Visual Generation and Inference with Energy Based Models

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this ability by directly combining probability distributions. Samples from the combined distribution correspond to compositions of concepts. For example, given a distribution for smiling faces, and another for male faces, we can combine them to generate smiling male faces. This allows us to generate natural images that simultaneously satisfy conjunctions, disjunctions, and negations of concepts. We evaluate compositional generation abilities of our model on the CelebA dataset of natural faces and synthetic 3D scene images. We also demonstrate other unique advantages of our model, such as the ability to continually learn and incorporate new concepts, or infer compositions of concept properties underlying an image.

[1]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[2]  Ernest Lepore,et al.  The compositionality papers , 2002 .

[3]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[4]  Yang Lu,et al.  Learning Generative ConvNets via Multi-grid Modeling and Sampling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[6]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[7]  Zhuowen Tu,et al.  Wasserstein Introspective Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[9]  Myle Ott,et al.  Residual Energy-Based Models for Text Generation , 2020, ICLR.

[10]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[11]  Shuang Li,et al.  Improved Contrastive Divergence Training of Energy Based Models , 2020, ICML.

[12]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[13]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[16]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[17]  Debora S. Marks,et al.  Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[18]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[19]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[20]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[21]  Dmitry P. Vetrov,et al.  Few-shot Generative Modelling with Generative Matching Networks , 2018, AISTATS.

[22]  Kevin Murphy,et al.  Generative Models of Visually Grounded Imagination , 2017, ICLR.

[23]  Thomas Paine,et al.  Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions , 2017, ICLR.

[24]  Murray Shanahan,et al.  SCAN: Learning Hierarchical Compositional Visual Concepts , 2017, ICLR.

[25]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[26]  Rob Fergus,et al.  Energy-based models for atomic-resolution protein conformations , 2020, ICLR.

[27]  Lior Wolf,et al.  Mask Based Unsupervised Content Transfer , 2019, ICLR.

[28]  Geoffrey E. Hinton,et al.  Learning nonlinear constraints with contrastive backpropagation , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[29]  Lior Wolf,et al.  Domain Intersection and Domain Difference , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[31]  Zhijian Ou,et al.  Learning Neural Random Fields with Inclusive Auxiliary Generators , 2018, ArXiv.

[32]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Tian Han,et al.  On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[34]  Jacob Andreas,et al.  Measuring Compositionality in Representation Learning , 2019, ICLR.

[35]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[36]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[37]  Zhijian Ou,et al.  Generative Modeling by Inclusive Neural Random Fields with Applications in Image Generation and Anomaly Detection , 2018 .

[38]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[39]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[40]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Lior Wolf,et al.  Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer , 2018, ICLR.

[42]  Sjoerd van Steenkiste,et al.  Investigating object compositionality in Generative Adversarial Networks , 2020, Neural Networks.

[43]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[44]  Marc'Aurelio Ranzato,et al.  Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[45]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[46]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[47]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[48]  Igor Mordatch,et al.  Model Based Planning with Energy Based Models , 2019, CoRL.

[49]  Simon Osindero,et al.  Meta-Learning Deep Energy-Based Memory Models , 2020, ICLR.

[50]  Sjoerd van Steenkiste,et al.  A Case for Object Compositionality in Deep Generative Models of Images , 2018, ArXiv.