论文信息 - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

Controllable semantic image editing enables a user to change entire image attributes with few clicks, e.g., gradually making a summer scene look like it was taken in winter. Classic approaches for this task use a Generative Adversarial Net (GAN) to learn a latent space and suitable latent-space transformations. However, current approaches often suffer from attribute edits which are entangled, global image identity changes, and diminished photo-realism. To address these concerns, we learn multiple attribute transformations simultaneously, we integrate attribute regression into the training of transformation functions, apply a content loss and an adversarial loss that encourage the maintenance of image identity and photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation. Our model permits better control for both singleand multiple-attribute editing, while also preserving image identity and realism during transformation. We provide empirical results for both real and synthetic images, highlighting that our model achieves state-of-the-art performance for targeted image manipulation.

[1] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Ming-Hsuan Yang,et al. Universal Style Transfer via Feature Transforms , 2017, NIPS.

[3] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[4] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Yu-Ding Lu,et al. DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2020, International Journal of Computer Vision.

[8] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Peter Wonka,et al. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10] Yuri Viazovetskyi,et al. StyleGAN2 Distillation for Feed-forward Image Manipulation , 2020, ECCV.

[11] Bolei Zhou,et al. Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Chang Edward,et al. RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes , 2019 .

[13] Jaakko Lehtinen,et al. GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[14] Omkar M. Parkhi,et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[15] Jung-Woo Ha,et al. StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] C'eline Hudelot,et al. Controlling generative models with continuous factors of variations , 2020, ICLR.

[17] Xiaofeng Tao,et al. Transient attributes for high-level understanding and editing of outdoor scenes , 2014, ACM Trans. Graph..

[18] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Alexia Jolicoeur-Martineau,et al. The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[21] Sylvain Paris,et al. Deep Photo Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[23] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[24] Bolei Zhou,et al. Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[26] Artem Babenko,et al. Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[27] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29] Tae-Kyun Kim,et al. Inducing Optimal Attribute Representations for Conditional GANs , 2020, ECCV.

[30] Alexei A. Efros,et al. Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[31] Phillip Isola,et al. On the "steerability" of generative adversarial networks , 2019, ICLR.

[32] Stephan J. Garbin,et al. CONFIG: Controllable Neural Face Image Generation , 2020, ECCV.

[33] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Edward Y. Chang,et al. RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Thomas Lukasiewicz,et al. Controllable Text-to-Image Generation , 2019, NeurIPS.

[36] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.