Collaborative Learning for Faster StyleGAN Embedding

The latent code of the recent popular model StyleGAN has learned disentangled representations thanks to the multi-layer style-based generator. Embedding a given image back to the latent space of StyleGAN enables wide interesting semantic image editing applications. Although previous works are able to yield impressive inversion results based on an optimization framework, which however suffers from the efficiency issue. In this work, we propose a novel collaborative learning framework that consists of an efficient embedding network and an optimization-based iterator. On one hand, with the progress of training, the embedding network gives a reasonable latent code initialization for the iterator. On the other hand, the updated latent code from the iterator in turn supervises the embedding network. In the end, high-quality latent code can be obtained efficiently with a single forward pass through our embedding network. Extensive experiments demonstrate the effectiveness and efficiency of our work.

[1]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[2]  Peter Wonka,et al.  Image2StyleGAN++: How to Edit the Embedded Images? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Roger Wattenhofer,et al.  TreeConnect: A Sparse Alternative to Fully Connected Layers , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[4]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[5]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Dong Xu,et al.  Collaborative and Adversarial Network for Unsupervised Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Anil A. Bharath,et al.  Inverting the Generator of a Generative Adversarial Network , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Liang Lin,et al.  BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network , 2018, ACM Multimedia.

[10]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Gang Hua,et al.  Collaborative Deep Reinforcement Learning for Joint Object Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Guocong Song,et al.  Collaborative Learning for Deep Neural Networks , 2018, NeurIPS.

[13]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[14]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[15]  Ling Shao,et al.  Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hongdong Li,et al.  Neural Collaborative Subspace Clustering , 2019, ICML.

[19]  Devi Parikh,et al.  Cooperative Learning with Visual Attributes , 2017, ArXiv.

[20]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[21]  Dan Xu,et al.  Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[24]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[27]  Xiao Liu,et al.  STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[30]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[31]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[32]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[33]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Fumin Shen,et al.  Make a Face: Towards Arbitrary High Fidelity Face Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Chu-Song Chen,et al.  Cross-Age Reference Coding for Age-Invariant Face Recognition and Retrieval , 2014, ECCV.

[38]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[40]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.