LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to an image region or a semantic category (e.g., sky) in the synthesis. Establishing such a connection facilitates a more convenient local control of GAN generation, where users can alter image content only within a spatial area simply by partially resampling the latent codes. Experimental results confirm four appealing properties of our regularizer, which we call LinkGAN . (1) Any image region can be linked to the latent space, even if the region is pre-selected before training and fixed for all instances. (2) Two or multiple regions can be independently linked to different latent axes, surprisingly allowing tokenized control of synthesized images. (3) Our regularizer can improve the spatial controllability of both 2D and 3D GAN models, barely sacrificing the synthesis performance. (4) The models trained with our regularizer are compatible with GAN inversion techniques and maintain editability on real images. Project page can be found here.

[1]  Bolei Zhou,et al.  Improving GANs with A Dynamic Discriminator , 2022, NeurIPS.

[2]  Qifeng Chen,et al.  Region-Based Semantic Factorization in GANs , 2022, International Conference on Machine Learning.

[3]  Qifeng Chen,et al.  3D-Aware Indoor Scene Synthesis with Depth Priors , 2022, ECCV.

[4]  Bolei Zhou,et al.  3D-aware Image Synthesis via Learning Structural and Textural Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiaoou Tang,et al.  InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sanja Fidler,et al.  EditGAN: High-Precision Semantic Image Editing , 2021, NeurIPS.

[8]  Wangmeng Zuo,et al.  Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Deli Zhao,et al.  Low-Rank Subspaces in GANs , 2021, NeurIPS.

[10]  S. Shan,et al.  EigenGAN: Layer-Wise Eigen-Learning for GANs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Sanja Fidler,et al.  DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pinar Yanardag,et al.  LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Daniel Cohen-Or,et al.  StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[15]  Binxu Wang,et al.  The Geometry of Deep Generative Image Models and its Applications , 2021, ArXiv.

[16]  Alon Shoshan,et al.  GAN-Control: Explicitly Controllable GANs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  B. Ommer,et al.  Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ron Banner,et al.  GAN Steerability without optimization , 2020, ICLR.

[19]  Artem Babenko,et al.  Navigating the GAN Parameter Space for Semantic Image Editing , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dani Lischinski,et al.  StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bolei Zhou,et al.  Generative Hierarchical Features from Synthesizing Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bolei Zhou,et al.  Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bolei Zhou,et al.  Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis , 2019, International Journal of Computer Vision.

[24]  Alexei A. Efros,et al.  The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement , 2020, ECCV.

[25]  David Bau,et al.  Rewriting a Deep Generative Model , 2020, ECCV.

[26]  Raja Bala,et al.  Editing in Style: Uncovering the Local Semantics of GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Aaron Hertzmann,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[28]  Artem Babenko,et al.  Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[29]  C'eline Hudelot,et al.  Controlling generative models with continuous factors of variations , 2020, ICLR.

[30]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Frank Guerin,et al.  Latent Space Factorisation and Manipulation via Matrix Subspace Projection , 2019, ICML.

[33]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[34]  Jeff Donahue,et al.  Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[35]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[38]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[39]  Takeru Miyato,et al.  Spatially Controllable Image Synthesis with Internal Representation Collaging , 2018, 1811.10153.

[40]  Colin Raffel,et al.  Is Generator Conditioning Causally Related to GAN Performance? , 2018, ICML.

[41]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[42]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[43]  Chris Donahue,et al.  Semantically Decomposing the Latent Spaces of Generative Adversarial Networks , 2017, ICLR.

[44]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[45]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[46]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[47]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[49]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[50]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[51]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.