SMIT: Stochastic Multi-Label Image-to-Image Translation

Cross-domain mapping has been a very active topic in recent years. Given one image, its main purpose is to translate it to the desired target domain, or multiple domains in the case of multiple labels. This problem is highly challenging due to three main reasons: (i) unpaired datasets, (ii) multiple attributes, and (iii) the multimodality (e.g. style) associated with the translation. Most of the existing state-of-the-art has focused only on two reasons i.e., either on (i) and (ii) or (i) and (iii). In this work, we propose a joint framework (i, ii, iii) of diversity and multi-mapping image-to-image translations, using a single generator to conditionally produce countless and unique fake images that hold the underlying characteristics of the source image. Our system does not use style regularization, instead, it uses an embedding representation that we call domain embedding for both domain and style. Extensive experiments over different datasets demonstrate the effectiveness of our proposed approach in comparison with the state-of-the-art in both multi-label and multimodal problems. Additionally, our method is able to generalize under different scenarios: continuous style interpolation, continuous label interpolation, and fine-grained mapping.

[1]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[4]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[5]  Yaser Sheikh,et al.  Recycle-GAN: Unsupervised Video Retargeting , 2018, ECCV.

[6]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[8]  Luc Van Gool,et al.  WESPE: Weakly Supervised Photo Enhancer for Digital Cameras , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Leon A. Gatys,et al.  Controlling Perceptual Factors in Neural Style Transfer , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[11]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Luc Van Gool,et al.  DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[14]  Luc Van Gool,et al.  Extreme Learned Image Compression with GANs , 2018, CVPR Workshops.

[15]  Shiguang Shan,et al.  Arbitrary Facial Attribute Editing: Only Change What You Want , 2017, ArXiv.

[16]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[17]  Chenxi Liu,et al.  Recurrent Multimodal Interaction for Referring Image Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[19]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[20]  Radu Timofte,et al.  2018 PIRM Challenge on Perceptual Image Super-resolution , 2018, ArXiv.

[21]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[22]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[23]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  David J. Kriegman,et al.  Image to Image Translation for Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[28]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29]  Bernard Ghanem,et al.  Finding Tiny Faces in the Wild with Generative Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31]  Skyler T. Hawk,et al.  Presentation and validation of the Radboud Faces Database , 2010 .

[32]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33]  Pablo Arbeláez,et al.  Dynamic Multimodal Instance Segmentation guided by natural language queries , 2018, ECCV.

[34]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[36]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[37]  Cristian Canton-Ferrer,et al.  Eye In-painting with Exemplar Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Luc Van Gool,et al.  ComboGAN: Unrestrained Scalability for Image Domain Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[40]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[42]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[43]  Jiaying Liu,et al.  Demystifying Neural Style Transfer , 2017, IJCAI.

[44]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Luc Van Gool,et al.  Exemplar Guided Unsupervised Image-to-Image Translation , 2018, ArXiv.

[46]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[47]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[49]  Kunio Kashino,et al.  Generative Attribute Controller with Conditional Filtered Generative Adversarial Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[54]  Philip Bachman,et al.  Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[55]  Yaser Sheikh,et al.  PixelNN: Example-based Image Synthesis , 2017, ICLR.

[56]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[58]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[59]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.