Surrogate Gradient Field for Latent Space Manipulation

Generative adversarial networks (GANs) can generate high-quality images from sampled latent codes. Recent works attempt to edit an image by manipulating its under-lying latent code, but rarely go beyond the basic task of at-tribute adjustment. We propose the first method that enables manipulation with multidimensional condition such as key-points and captions. Specifically, we design an algorithm that searches for a new latent code that satisfies the target condition based on the Surrogate Gradient Field (SGF) induced by an auxiliary mapping network. For quantitative comparison, we propose a metric to evaluate the disentanglement of manipulation methods. Thorough experimental analysis on the facial attribute adjustment task shows that our method outperforms state-of-the-art methods in disentanglement. We further apply our method to tasks of various condition modalities to demonstrate that our method can alter complex image properties such as keypoints and captions.

[1]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Aude Oliva,et al.  GANalyze: Toward Visual Definitions of Cognitive Image Properties , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[5]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[7]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[8]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[10]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[11]  Bolei Zhou,et al.  Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[12]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[13]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[14]  Yingtao Tian,et al.  Towards the Automatic Anime Characters Creation with Generative Adversarial Networks , 2017, ArXiv.

[15]  Jaakko Lehtinen,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[16]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[18]  Artem Babenko,et al.  Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[19]  Peter Wonka,et al.  StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows , 2020, ArXiv.

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  C'eline Hudelot,et al.  Controlling generative models with continuous factors of variations , 2020, ICLR.

[22]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[23]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[24]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[25]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[26]  Mary C. Brennan,et al.  on the , 1982 .

[27]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Song Han,et al.  Differentiable Augmentation for Data-Efficient GAN Training , 2020, NeurIPS.

[30]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[33]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[34]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Yusuke Matsui,et al.  Illustration2Vec: a semantic vector representation of illustrations , 2015, SIGGRAPH Asia Technical Briefs.

[36]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.