D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation

Conditional generative models of high-dimensional images have many applications, but supervision signals from conditions to images can be expensive to acquire. This paper describes Diffusion-Decoding models with Contrastive representations (D2C), a paradigm for training unconditional variational autoencoders (VAEs) for few-shot conditional image generation. D2C uses a learned diffusion-based prior over the latent representations to improve generation and contrastive selfsupervised learning to improve representation quality. D2C can adapt to novel generation tasks conditioned on labels or manipulation constraints, by learning from as few as 100 labeled examples. On conditional generation from new labels, D2C achieves superior performance over state-of-the-art VAEs and diffusion models. On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% − 60% of the human evaluators in a double-blind study. We release our code at https: //github.com/jiamings/d2c.

[1]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Stefano Ermon,et al.  Graphite: Iterative Generative Modeling of Graphs , 2018, ICML.

[3]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[4]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[5]  Zahra Kadkhodaie,et al.  Solving Linear Inverse Problems Using the Prior Implicit in a Denoiser , 2020, ArXiv.

[6]  Daniel Cohen-Or,et al.  StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  C. Lawrence Zitnick,et al.  Generative Adversarial Transformers , 2021, ICML.

[8]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[9]  Jan Kautz,et al.  NCP-VAE: Variational Autoencoders with Noise Contrastive Priors , 2020, ArXiv.

[10]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[13]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Deli Zhao,et al.  In-Domain GAN Inversion for Real Image Editing , 2020, ECCV.

[16]  Saining Xie,et al.  An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Hiroshi Takahashi,et al.  Variational Autoencoder with Implicit Optimal Priors , 2018, AAAI.

[18]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[19]  Dmitry P. Vetrov,et al.  Few-shot Generative Modelling with Generative Matching Networks , 2018, AISTATS.

[20]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2021, ICLR.

[21]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Matthias Zwicker,et al.  Faceshop , 2018, ACM Trans. Graph..

[23]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[24]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[25]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[28]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[29]  Ali Ahmed,et al.  Invertible generative models for inverse problems: mitigating representation error and dataset bias , 2019, ICML.

[30]  Ilya Sutskever,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[31]  Xiujun Li,et al.  Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space , 2020, EMNLP.

[32]  Dhruv Batra,et al.  Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[34]  Stefano Ermon,et al.  Multi-label Contrastive Predictive Coding , 2020, Neural Information Processing Systems.

[35]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[36]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[37]  Andrew Brock,et al.  Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[38]  Yue Ding,et al.  Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation , 2021, WWW.

[39]  Louis Clouâtre,et al.  FIGR: Few-shot Image Generation with Reptile , 2019, ArXiv.

[40]  Gordon Christie,et al.  Functional Map of the World , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[42]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[43]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[44]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[45]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[46]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[47]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[48]  Ruslan Salakhutdinov,et al.  Generating Images from Captions with Attention , 2015, ICLR.

[49]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[50]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[52]  Lawrence Carin,et al.  ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching , 2017, NIPS.

[53]  Bolei Zhou,et al.  Seeing What a GAN Cannot Generate , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[55]  Alexei A. Efros,et al.  Swapping Autoencoder for Deep Image Manipulation , 2020, NeurIPS.

[56]  Baoyuan Wu,et al.  TediGAN: Text-Guided Diverse Face Image Generation and Manipulation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[59]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[60]  Bolei Zhou,et al.  Generative Hierarchical Features from Synthesizing Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[62]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[63]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[64]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[65]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[67]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[68]  Rewon Child Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images , 2021, ICLR.

[69]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[70]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[71]  Alexandros G. Dimakis,et al.  Intermediate Layer Optimization for Inverse Problems using Deep Generative Models , 2021, ICML.

[72]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[73]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[74]  Stefano Ermon,et al.  Learning Controllable Fair Representations , 2018, AISTATS.

[75]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[76]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[77]  Jan Kautz,et al.  NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[78]  Kainan Peng,et al.  WaveFlow: A Compact Flow-based Model for Raw Audio , 2020, ICML.

[79]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[81]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[82]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[83]  Curtis Hawthorne,et al.  Symbolic Music Generation with Diffusion Models , 2021, ISMIR.

[84]  Stefano Ermon,et al.  Permutation Invariant Graph Generation via Score-Based Generative Modeling , 2020, AISTATS.

[85]  David P. Wipf,et al.  Diagnosing and Enhancing VAE Models , 2019, ICLR.

[86]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[87]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[88]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[89]  Bo Zhang,et al.  LIA: Latently Invertible Autoencoder with Adversarial Learning , 2019, ArXiv.

[90]  Stefano Ermon,et al.  Towards Deeper Understanding of Variational Autoencoding Models , 2017, ArXiv.

[91]  Ilya Sutskever,et al.  Jukebox: A Generative Model for Music , 2020, ArXiv.

[92]  Stefano Ermon,et al.  Fair Generative Modeling via Weak Supervision , 2020, ICML.

[93]  Jakub M. Tomczak,et al.  UvA-DARE ( Digital Academic Repository ) Improving Variational Auto-Encoders using Householder Flow , 2016 .

[94]  Yoshua Bengio,et al.  Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[96]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[97]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[98]  Percy Liang,et al.  Selective Classification Can Magnify Disparities Across Groups , 2020, ICLR.

[99]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[100]  Stefano Ermon,et al.  A Lagrangian Perspective on Latent Variable Generative Models , 2018, UAI.

[101]  Shakir Mohamed,et al.  Distribution Matching in Variational Inference , 2018, ArXiv.