Zero-Pair Image to Image Translation using Domain Conditional Normalization

In this paper, we propose an approach based on domain conditional normalization (DCN) for zero-pair image-to-image translation, i.e., translating between two domains which have no paired training data available but each have paired training data with a third domain. We employ a single generator which has an encoder-decoder structure and analyze different implementations of domain conditional normalization to obtain the desired target domain output. The validation benchmark uses RGB-depth pairs and RGB-semantic pairs for training and compares performance for the depth-semantic translation task. The proposed approaches improve in qualitative and quantitative terms over the compared methods, while using much fewer parameters. Code available at this https URL

[1]  Luc Van Gool,et al.  SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects , 2020, ECCV.

[2]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Luc Van Gool,et al.  Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[5]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[6]  Stefan Leutenegger,et al.  SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[8]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[9]  Joost van de Weijer,et al.  Mix and Match Networks: Cross-Modal Alignment for Zero-Pair Image-to-Image Translation , 2020, International Journal of Computer Vision.

[10]  Luc Van Gool,et al.  DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Luc Van Gool,et al.  ComboGAN: Unrestrained Scalability for Image Domain Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[13]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[14]  Luc Van Gool,et al.  Exemplar Guided Unsupervised Image-to-Image Translation , 2018, ArXiv.

[15]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Magnus Wrenninge,et al.  Synscapes: A Photorealistic Synthetic Dataset for Street Scene Parsing , 2018, ArXiv.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Hugo Larochelle,et al.  Modulating early visual processing by language , 2017, NIPS.

[20]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Luc Van Gool,et al.  SMIT: Stochastic Multi-Label Image-to-Image Translation , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[23]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[24]  Radu Timofte,et al.  DeepSEE: Deep Disentangled Semantic Explorative Extreme Super-Resolution , 2020, ArXiv.

[25]  Joost van de Weijer,et al.  Mix and match networks: multi-domain alignment for unpaired image-to-image translation , 2019, ArXiv.

[26]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[27]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[28]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[29]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[30]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).