Controllable and Compositional Generation with Latent-Space Energy-Based Models

Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energybased models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024⇥1024.

[1]  Arash Vahdat,et al.  DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors , 2018, NeurIPS.

[2]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[3]  Arthur Gretton,et al.  Generalized Energy Based Models , 2020, ICLR.

[4]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[5]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[6]  S. Shreve,et al.  Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[7]  Yedid Hoshen,et al.  An Image is Worth More Than a Thousand Words: Towards Disentanglement in the Wild , 2021, NeurIPS.

[8]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[9]  Peter Wonka,et al.  StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows , 2020, ArXiv.

[10]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[11]  Bolei Zhou,et al.  Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Arash Vahdat,et al.  DVAE++: Discrete Variational Autoencoders with Overlapping Transformations , 2018, ICML.

[13]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[14]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[16]  Jan Kautz,et al.  NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[20]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[21]  A. Schwing,et al.  A Contrastive Learning Approach for Training Variational Autoencoder Priors , 2020, NeurIPS.

[22]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[23]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[24]  Anjul Patney,et al.  Semi-Supervised StyleGAN for Disentanglement Learning , 2020, ICML.

[25]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[26]  Jan Kautz,et al.  Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[27]  Harold Soh,et al.  Refining Deep Generative Models via Discriminator Gradient Flow , 2021, ICLR.

[28]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Tian Han,et al.  Learning Latent Space Energy-Based Prior Model , 2020, NeurIPS.

[30]  Aude Oliva,et al.  GANalyze: Toward Visual Definitions of Cognitive Image Properties , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[32]  Jaakko Lehtinen,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[33]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[34]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[35]  Andrew Chadwick,et al.  Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News , 2020, Social Media + Society.

[36]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[39]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[40]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[42]  Yoshua Bengio,et al.  Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling , 2020, NeurIPS.

[43]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[44]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[46]  Daniel Cohen-Or,et al.  StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Diederik P. Kingma,et al.  How to Train Your Energy-Based Models , 2021, ArXiv.

[48]  F. Santambrogio {Euclidean, metric, and Wasserstein} gradient flows: an overview , 2016, 1609.03890.

[49]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[51]  Max Welling Donald,et al.  Products of Experts , 2007 .

[52]  VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models , 2020, ArXiv.

[53]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[55]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[57]  Tian Han,et al.  Alternating Back-Propagation for Generator Network , 2016, AAAI.

[58]  Shuang Li,et al.  Compositional Visual Generation with Energy Based Models , 2020, NeurIPS.

[59]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Shuang Li,et al.  Improved Contrastive Divergence Training of Energy Based Models , 2020, ICML.