CerfGAN: A Compact, Effective, Robust, and Fast Model for Unsupervised Multi-Domain Image-to-Image Translation

In this paper, we aim at solving the multi-domain image-to-image translation problem with a unified model in an unsupervised manner. The most successful work in this area refers to StarGAN, which works well in tasks like face attribute modulation. However, StarGAN is unable to match multiple translation mappings when encountering general translations with very diverse domain shifts. On the other hand, StarGAN adopts an Encoder-Decoder-Discriminator (EDD) architecture, where the model is time-consuming and unstable to train. To this end, we propose a Compact, effective, robust, and fast GAN model, termed CerfGAN, to solve the above problem. In principle, CerfGAN contains a novel component, i.e., a multi-class discriminator (MCD), which gives the model an extremely powerful ability to match multiple translation mappings. To stabilize the training process, MCD also plays a role of the encoder in CerfGAN, which saves a lot of computation and memory costs. We perform extensive experiments to verify the effectiveness of the proposed method. Quantitatively, CerfGAN is demonstrated to handle a serial of image-to-image translation tasks including style transfer, season transfer, face hallucination, etc, where the input images are sampled from diverse domains. The comparisons to several recently proposed approaches demonstrate the superiority and novelty of the proposed method.

[1]  Radim Sára,et al.  Spatial Pattern Templates for Recognition of Objects with Regular Structure , 2013, GCPR.

[2]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[8]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[9]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[10]  Amit R.Sharma,et al.  Face Photo-Sketch Synthesis and Recognition , 2012 .

[11]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Philip H. S. Torr,et al.  Multi-agent Diverse Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17]  Xuelong Li,et al.  A Comprehensive Survey to Face Hallucination , 2013, International Journal of Computer Vision.

[18]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[19]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[20]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[26]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[28]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[31]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[34]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.