论文信息 - StarGAN v2: Diverse Image Synthesis for Multiple Domains

StarGAN v2: Diverse Image Synthesis for Multiple Domains

A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains. Existing methods address either of the issues, having limited diversity or multiple models for all domains. We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. Experiments on CelebA-HQ and a new animal faces dataset (AFHQ) validate our superiority in terms of visual quality, diversity, and scalability. To better assess image-to-image translation models, we release AFHQ, high-quality animal faces with large inter- and intra-domain differences. The code, pretrained models, and dataset are available at https://github.com/clovaai/stargan-v2.

Jung-Woo Ha | Youngjung Uh | Jaejun Yoo | Yunjey Choi

[1] Luc Van Gool,et al. ComboGAN: Unrestrained Scalability for Image Domain Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2] Adam Finkelstein,et al. PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[4] Jaegul Choo,et al. MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation , 2019, ArXiv.

[5] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[6] Luc Van Gool,et al. SMIT: Stochastic Multi-Label Image-to-Image Translation , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[7] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[9] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Luc Van Gool,et al. Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency , 2018, ICLR.

[12] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[13] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[14] Eunhyeok Park,et al. Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[16] Stefan Winkler,et al. The Unusual Effectiveness of Averaging in GAN Training , 2018, ICLR.

[17] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Alexei A. Efros,et al. Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[19] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[20] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[22] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Xiaohua Zhai,et al. High-Fidelity Image Generation With Fewer Labels , 2019, ICML.

[24] Yoshua Bengio,et al. Feature-wise transformations , 2018, Distill.

[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26] Sebastian Nowozin,et al. Which Training Methods for GANs do actually Converge? , 2018, ICML.

[27] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Fuxin Li,et al. Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Jung-Woo Ha,et al. NSML: A Machine Learning Platform That Enables You to Focus on Your Models , 2017, ArXiv.

[31] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[32] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[33] Jung-Woo Ha,et al. NSML: Meet the MLaaS platform with a real-world case study , 2018, ArXiv.

[34] Yoshua Bengio,et al. Generative Adversarial Networks , 2014, ArXiv.

[35] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36] Takeru Miyato,et al. cGANs with Projection Discriminator , 2018, ICLR.

[37] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Jaakko Lehtinen,et al. Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39] Le Hui,et al. Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[40] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[41] Jaegul Choo,et al. Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[44] Jeff Donahue,et al. Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[45] Joost van de Weijer,et al. SDIT: Scalable and Diverse Cross-domain Image Translation , 2019, ACM Multimedia.

[46] Siwei Ma,et al. Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Jonathon Shlens,et al. Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[48] Maneesh Kumar Singh,et al. DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[49] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[51] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52] Yu Qiao,et al. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[53] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[54] Xiaoming Yu,et al. Multi-mapping Image-to-Image Translation via Learning Disentanglement , 2019, NeurIPS.

[55] Philip Bachman,et al. Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.