Latent Transformations via NeuralODEs for GAN-based Image Editing

Recent advances in high-fidelity semantic image editing heavily rely on the presumably disentangled latent spaces of the state-of-the-art generative models, such as StyleGAN. Specifically, recent works show that it is possible to achieve decent controllability of attributes in face images via linear shifts along with latent directions. Several recent methods address the discovery of such directions, implicitly assuming that the state-of-the-art GANs learn the latent spaces with inherently linearly separable attribute distributions and semantic vector arithmetic properties. In our work, we show that nonlinear latent code manipulations realized as flows of a trainable Neural ODE are beneficial for many practical non-face image domains with more complex non-textured factors of variation. In particular, we investigate a large number of datasets with known attributes and demonstrate that certain attribute manipulations are challenging to obtain with linear shifts only.

[1]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  C'eline Hudelot,et al.  Controlling generative models with continuous factors of variations , 2020, ICLR.

[3]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[4]  Xiaofeng Tao,et al.  Transient attributes for high-level understanding and editing of outdoor scenes , 2014, ACM Trans. Graph..

[5]  Peter Wonka,et al.  Image2StyleGAN++: How to Edit the Embedded Images? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[7]  David Bau,et al.  Rewriting a Deep Generative Model , 2020, ECCV.

[8]  Oluwasanmi Koyejo,et al.  Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation , 2021, ICLR.

[9]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[10]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Aude Oliva,et al.  GANalyze: Toward Visual Definitions of Cognitive Image Properties , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Antonio Torralba,et al.  The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement , 2020, ECCV.

[16]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[17]  Stefan Bauer,et al.  On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset , 2019, NeurIPS.

[18]  Artem Babenko,et al.  Navigating the GAN Parameter Space for Semantic Image Editing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Raja Bala,et al.  Editing in Style: Uncovering the Local Semantics of GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ron Banner,et al.  GAN Steerability without optimization , 2020, ICLR.

[21]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[22]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[23]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Anjul Patney,et al.  Semi-Supervised StyleGAN for Disentanglement Learning , 2020, ICML.

[25]  Jaakko Lehtinen,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[26]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yu-Ding Lu,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2020, International Journal of Computer Vision.

[28]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Peter Wonka,et al.  StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows , 2020, ArXiv.

[30]  Artem Babenko,et al.  Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[31]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[32]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[33]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .