Modular Generative Adversarial Networks

Existing methods for multi-domain image-to-image translation (or generation) attempt to directly map an input image (or a random vector) to an image in one of the output domains. However, most existing methods have limited scalability and robustness, since they require building independent models for each pair of domains in question. This leads to two significant shortcomings: (1) the need to train exponential number of pairwise models, and (2) the inability to leverage data from other domains when training a particular pairwise mapping. Inspired by recent work on module networks, this paper proposes ModularGAN for multi-domain image generation and image-to-image translation. ModularGAN consists of several reusable and composable modules that carry on different functions (e.g., encoding, decoding, transformations). These modules can be trained simultaneously, leveraging data from all domains, and then combined to construct specific GAN networks at test time, according to the specific image translation task. This leads to ModularGAN's superior flexibility of generating (or translating to) an image in any desired domain. Experimental results demonstrate that our model not only presents compelling perceptual results but also outperforms state-of-the-art methods on multi-domain facial attribute transfer.

[1]  David Zhang,et al.  Deep Identity-aware Transfer of Facial Attributes , 2016, ArXiv.

[2]  Bernt Schiele,et al.  A Hybrid Model for Identity Obfuscation by Face Replacement , 2018, ECCV.

[3]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[4]  Jinwen Ma,et al.  ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes , 2018, ECCV.

[5]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[7]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[8]  Bogdan Raducanu,et al.  Invertible Conditional GANs for image editing , 2016, ArXiv.

[9]  Aykut Erdem,et al.  Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts , 2016, ArXiv.

[10]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Bohyung Han,et al.  Visual Reference Resolution using Attention Memory for Visual Dialog , 2017, NIPS.

[12]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[13]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[14]  Wei Liu,et al.  Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks , 2018, ECCV.

[15]  Bo Zhao,et al.  Multi-View Image Generation from a Single-View , 2017, ACM Multimedia.

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Wei Shen,et al.  Learning Residual Images for Face Attribute Manipulation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Fisher Yu,et al.  Scribbler: Controlling Deep Image Synthesis with Sketch and Color , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Dan Klein,et al.  Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[22]  Bernt Schiele,et al.  Learning What and Where to Draw , 2016, NIPS.

[23]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[24]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[26]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Qiong Zhang,et al.  Generating Handwritten Chinese Characters Using CycleGAN , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[30]  Honglak Lee,et al.  Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[31]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.