SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing

Recently, large pretrained models (e.g., BERT, StyleGAN, CLIP) have shown great knowledge transfer and generalization capability on various downstream tasks within their domains. Inspired by these efforts, in this paper we propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images while keeping their original content and structure. Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate than the operation space (e.g., contrast, brightness, color curve) used in many existing photo editing softwares. Our model belongs to the image-to-image translation framework which consists of an image encoder and decoder, and is trained on pairs of beforeand after-images to produce multimodal outputs. We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks such as language-guided image editing, personalized editing, editing-style clustering, retrieval, etc. We extensively study the unique properties of the editing space in experiments and demonstrate superior performance on the aforementioned tasks.

[1]  Alexei A. Efros,et al.  Multimodal Image-to-Image Translation by Enforcing Bi-Cycle Consistency , 2017, NIPS 2017.

[2]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[3]  Xiaodong Liu,et al.  Language-Based Image Editing with Recurrent Attentive Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Siwei Ma,et al.  Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yike Guo,et al.  Semantic Image Synthesis via Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Quan Hung Tran,et al.  Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions , 2020, ECCV.

[9]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[10]  Yoshua Bengio,et al.  Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[13]  Bolei Zhou,et al.  Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis , 2019, ArXiv.

[14]  Bolei Zhou,et al.  Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yuan He,et al.  Bilinear Representation for Language-based Image Editing Using Conditional Generative Adversarial Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Jason D. Williams,et al.  Learning to Globally Edit Images with Textual Description , 2018, ArXiv.

[18]  Artem Babenko,et al.  Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[19]  Raja Bala,et al.  Editing in Style: Uncovering the Local Semantics of GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[21]  Trung Bui,et al.  Learning by Planning: Language-Guided Global Image Editing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaoming Yu,et al.  Multi-mapping Image-to-Image Translation via Learning Disentanglement , 2019, NeurIPS.

[23]  Baoyuan Wu,et al.  TediGAN: Text-Guided Diverse Face Image Generation and Manipulation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[25]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Daniel Cohen-Or,et al.  Pivotal Tuning for Latent-based Editing of Real Images , 2021, ACM Trans. Graph..

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Hongsheng Li,et al.  DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Binxu Wang,et al.  The Geometry of Deep Generative Image Models and its Applications , 2021, ArXiv.

[31]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[32]  Seonghyeon Nam,et al.  Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language , 2018, NeurIPS.

[33]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[34]  Shengyu Zhao,et al.  Large Scale Image Completion via Co-Modulated Generative Adversarial Networks , 2021, ICLR.

[35]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[37]  Jung-Woo Ha,et al.  Photorealistic Style Transfer via Wavelet Transforms , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Yuda Song,et al.  StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Thomas Lukasiewicz,et al.  ManiGAN: Text-Guided Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Zhaorui Gu,et al.  Image Harmonization with Transformer , 2021, IEEE International Conference on Computer Vision.

[41]  Mary C. Brennan,et al.  on the , 1982 .

[42]  Daniel Cohen-Or,et al.  StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Peter Wonka,et al.  Image2StyleGAN++: How to Edit the Embedded Images? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Aude Oliva,et al.  GANalyze: Toward Visual Definitions of Cognitive Image Properties , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  C'eline Hudelot,et al.  Controlling generative models with continuous factors of variations , 2020, ICLR.

[47]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[48]  Beomsu Kim,et al.  Filter Style Transfer between Photos , 2020, ECCV.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Dani Lischinski,et al.  StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Zhe L. Lin,et al.  Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Yu Qiao,et al.  Conditional Sequential Modulation for Efficient Global Image Retouching , 2020, ECCV.

[53]  Franck Dernoncourt,et al.  A Benchmark and Baseline for Language-Driven Image Editing , 2020, ACCV.