StarGAN v2: Diverse Image Synthesis for Multiple Domains

A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains. Existing methods address either of the issues, having limited diversity or multiple models for all domains. We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. Experiments on CelebA-HQ and a new animal faces dataset (AFHQ) validate our superiority in terms of visual quality, diversity, and scalability. To better assess image-to-image translation models, we release AFHQ, high-quality animal faces with large inter- and intra-domain differences. The code, pretrained models, and dataset are available at https://github.com/clovaai/stargan-v2.

[1]  Luc Van Gool,et al.  ComboGAN: Unrestrained Scalability for Image Domain Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Adam Finkelstein,et al.  PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[4]  Jaegul Choo,et al.  MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation , 2019, ArXiv.

[5]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[6]  Luc Van Gool,et al.  SMIT: Stochastic Multi-Label Image-to-Image Translation , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[9]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Luc Van Gool,et al.  Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency , 2018, ICLR.

[12]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[14]  Eunhyeok Park,et al.  Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[16]  Stefan Winkler,et al.  The Unusual Effectiveness of Averaging in GAN Training , 2018, ICLR.

[17]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[19]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[20]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[22]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaohua Zhai,et al.  High-Fidelity Image Generation With Fewer Labels , 2019, ICML.

[24]  Yoshua Bengio,et al.  Feature-wise transformations , 2018, Distill.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[27]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Fuxin Li,et al.  Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Jung-Woo Ha,et al.  NSML: A Machine Learning Platform That Enables You to Focus on Your Models , 2017, ArXiv.

[31]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[32]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[33]  Jung-Woo Ha,et al.  NSML: Meet the MLaaS platform with a real-world case study , 2018, ArXiv.

[34]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[35]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[37]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Le Hui,et al.  Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[40]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[41]  Jaegul Choo,et al.  Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[44]  Jeff Donahue,et al.  Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[45]  Joost van de Weijer,et al.  SDIT: Scalable and Diverse Cross-domain Image Translation , 2019, ACM Multimedia.

[46]  Siwei Ma,et al.  Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[48]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[49]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[51]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[53]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[54]  Xiaoming Yu,et al.  Multi-mapping Image-to-Image Translation via Learning Disentanglement , 2019, NeurIPS.

[55]  Philip Bachman,et al.  Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.