Multi-mapping Image-to-Image Translation via Learning Disentanglement

Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation. However, the existing methods only consider one of the two perspectives, which makes them unable to solve each other's problem. To address this issue, we propose a novel unified model, which bridges these two objectives. First, we disentangle the input images into the latent representations by an encoder-decoder architecture with a conditional adversarial training in the feature space. Then, we encourage the generator to learn multi-mappings by a random cross-domain translation. As a result, we can manipulate different parts of the latent representations to perform multi-modal and multi-domain translations simultaneously. Experiments demonstrate that our method outperforms state-of-the-art methods.

[1]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[2]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[4]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[5]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[6]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[8]  Ge Li,et al.  Multi-Mapping Image-to-Image Translation with Central Biasing Normalization , 2018, ArXiv.

[9]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[10]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[15]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Seunghoon Hong,et al.  Diversity-Sensitive Conditional Generative Adversarial Networks , 2019, ICLR.

[17]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[18]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Yu Tian,et al.  CR-GAN: Learning Complete Representations for Multi-view Generation , 2018, IJCAI.

[20]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[21]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[22]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[23]  Xing Cai,et al.  SingleGAN: Image-to-Image Translation by a Single-Generator Network using Multiple Generative Adversarial Learning , 2018, ACCV.

[24]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Seonghyeon Nam,et al.  Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language , 2018, NeurIPS.

[26]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[27]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Akihiro Sugimoto,et al.  Paired-D GAN for Semantic Image Synthesis , 2018, ACCV.

[29]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[31]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[32]  Yike Guo,et al.  Semantic Image Synthesis via Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[35]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Lior Wolf,et al.  Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer , 2018, ICLR.

[38]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[39]  Luc Van Gool,et al.  SMIT: Stochastic Multi-Label Image-to-Image Translation , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[40]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[41]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[42]  Joost van de Weijer,et al.  Image-to-image translation for cross-domain disentanglement , 2018, NeurIPS.

[43]  Joost van de Weijer,et al.  SDIT: Scalable and Diverse Cross-domain Image Translation , 2019, ACM Multimedia.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yu-Chiang Frank Wang,et al.  A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation , 2018, NeurIPS.

[47]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[48]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[49]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).