Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

Controllable semantic image editing enables a user to change entire image attributes with few clicks, e.g., gradually making a summer scene look like it was taken in winter. Classic approaches for this task use a Generative Adversarial Net (GAN) to learn a latent space and suitable latent-space transformations. However, current approaches often suffer from attribute edits which are entangled, global image identity changes, and diminished photo-realism. To address these concerns, we learn multiple attribute transformations simultaneously, we integrate attribute regression into the training of transformation functions, apply a content loss and an adversarial loss that encourage the maintenance of image identity and photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation. Our model permits better control for both singleand multiple-attribute editing, while also preserving image identity and realism during transformation. We provide empirical results for both real and synthetic images, highlighting that our model achieves state-of-the-art performance for targeted image manipulation.

[1]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ming-Hsuan Yang,et al.  Universal Style Transfer via Feature Transforms , 2017, NIPS.

[3]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[4]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yu-Ding Lu,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2020, International Journal of Computer Vision.

[8]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Yuri Viazovetskyi,et al.  StyleGAN2 Distillation for Feed-forward Image Manipulation , 2020, ECCV.

[11]  Bolei Zhou,et al.  Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Chang Edward,et al.  RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes , 2019 .

[13]  Jaakko Lehtinen,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[14]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[15]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  C'eline Hudelot,et al.  Controlling generative models with continuous factors of variations , 2020, ICLR.

[17]  Xiaofeng Tao,et al.  Transient attributes for high-level understanding and editing of outdoor scenes , 2014, ACM Trans. Graph..

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[21]  Sylvain Paris,et al.  Deep Photo Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[23]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[24]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Artem Babenko,et al.  Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[27]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Tae-Kyun Kim,et al.  Inducing Optimal Attribute Representations for Conditional GANs , 2020, ECCV.

[30]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[31]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[32]  Stephan J. Garbin,et al.  CONFIG: Controllable Neural Face Image Generation , 2020, ECCV.

[33]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Edward Y. Chang,et al.  RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Thomas Lukasiewicz,et al.  Controllable Text-to-Image Generation , 2019, NeurIPS.

[36]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.