论文信息 - Paint by Word

Paint by Word

We investigate the problem of zero-shot semantic image painting. Instead of painting modifications into an image using only concrete colors or a finite set of semantic concepts, we ask how to create semantic paint based on open full-text descriptions: our goal is to be able to point to a location in a synthesized image and apply an arbitrary new concept such as “rustic” or “opulent” or “happy dog.” To do this, our method combines a state-of-the art generative model of realistic images with a state-of-the-art text-image semantic similarity network. We find that, to make large changes, it is important to use non-gradient methods to explore latent space, and it is important to relax the computations of the GAN to target changes to a specific region. We conduct user studies to compare our methods to several baselines. *equal contribution

[1] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Yoshua Bengio,et al. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Mary C. Brennan,et al. on the , 1982 .

[6] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[7] Lin Yang,et al. Photographic Text-to-Image Synthesis with a Hierarchically-Nested Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[9] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Ilya Sutskever,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.

[11] Bolei Zhou,et al. Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Subarna Tripathi,et al. Precise Recovery of Latent Vectors from Generative Adversarial Networks , 2017, ICLR.

[13] Wei Chen,et al. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[15] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[16] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Bolei Zhou,et al. Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[18] Tero Karras,et al. Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[19] Ruslan Salakhutdinov,et al. Generating Images from Captions with Attention , 2015, ICLR.

[20] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[21] Daniel Cohen-Or,et al. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Dani Lischinski,et al. StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Alexei A. Efros,et al. Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[24] Peter Wonka,et al. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Aude Oliva,et al. GANalyze: Toward Visual Definitions of Cognitive Image Properties , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[27] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Bolei Zhou,et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[29] Xiaogang Wang,et al. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Jing Zhang,et al. MirrorGAN: Learning Text-To-Image Generation by Redescription , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[32] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[33] Nicu Sebe,et al. DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis , 2020, ArXiv.

[34] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Deli Zhao,et al. In-Domain GAN Inversion for Real Image Editing , 2020, ECCV.

[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[39] Peter Wonka,et al. Improved StyleGAN Embedding: Where are the Good Latents? , 2020, ArXiv.

[40] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Jaakko Lehtinen,et al. GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[42] Minyoung Huh,et al. Transforming and Projecting Images into Class-conditional Generative Networks , 2020, ECCV.

[43] Li Fei-Fei,et al. Image Generation from Scene Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Nenghai Yu,et al. Semantics Disentangling for Text-To-Image Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[46] Emily Denton,et al. Detecting Bias with Generative Counterfactual Face Attribute Augmentation , 2019, ArXiv.

[47] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[48] Bingbing Ni,et al. Collaborative Learning for Faster StyleGAN Embedding , 2020, ArXiv.

[49] Lei Zhang,et al. Object-Driven Text-To-Image Synthesis via Adversarial Training , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[51] Peter Wonka,et al. Image2StyleGAN++: How to Edit the Embedded Images? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[53] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.