Contrastive Learning for Unpaired Image-to-Image Translation

In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain. We propose a straightforward method for doing so -- maximizing mutual information between the two, using a framework based on contrastive learning. The method encourages two elements (corresponding patches) to map to a similar point in a learned feature space, relative to other elements (other patches) in the dataset, referred to as negatives. We explore several critical design choices for making contrastive learning effective in the image synthesis setting. Notably, we use a multilayer, patch-based approach, rather than operate on entire images. Furthermore, we draw negatives from within the input image itself, rather than from the rest of the dataset. We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time. In addition, our method can even be extended to the training setting where each "domain" is only a single image.

[1]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Martial Hebert,et al.  Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[3]  Lawrence Carin,et al.  ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching , 2017, NIPS.

[4]  Jing Liao,et al.  Arbitrary Style Transfer with Deep Feature Reshuffle , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[6]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[8]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Eric P. Xing,et al.  Generative Semantic Manipulation with Mask-Contrasting GAN , 2018, ECCV.

[11]  Michal Irani,et al.  Internal statistics of a single natural image , 2011, CVPR 2011.

[12]  Jung-Woo Ha,et al.  Photorealistic Style Transfer via Wavelet Transforms , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Lihi Zelnik-Manor,et al.  The Contextual Loss for Image Transformation with Non-Aligned Data , 2018, ECCV.

[14]  Michal Irani,et al.  InGAN: Capturing and Retargeting the “DNA” of a Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[17]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Chen Qian,et al.  TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Shai Bagon,et al.  InGAN: Capturing and Remapping the "DNA" of a Natural Image , 2018 .

[20]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[22]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[24]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[25]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[26]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[29]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Kun Zhang,et al.  Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[33]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[34]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Mohi Khansari,et al.  RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Tali Dekel,et al.  SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[40]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[41]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[42]  Nicu Sebe,et al.  Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[43]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[44]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[46]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[47]  Edward H. Adelson,et al.  Learning visual groups from co-occurrences in space and time , 2015, ArXiv.

[48]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[49]  Edward H. Adelson,et al.  Crisp Boundary Detection Using Pointwise Mutual Information , 2014, ECCV.

[50]  Gregory Shakhnarovich,et al.  Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53]  Lior Wolf,et al.  Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[54]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[55]  Philip Bachman,et al.  Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[56]  Sindy Löwe,et al.  Putting An End to End-to-End: Gradient-Isolated Learning of Representations , 2019, NeurIPS.

[57]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[58]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[59]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Lihi Zelnik-Manor,et al.  Maintaining Natural Image Statistics with the Contextual Loss , 2018, ACCV.

[62]  Rui Zhang,et al.  Harmonic Unpaired Image-to-image Translation , 2019, ICLR.

[63]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[64]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[66]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[68]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[69]  Lior Wolf,et al.  One-Sided Unsupervised Domain Mapping , 2017, NIPS.

[70]  Sylvain Paris,et al.  Deep Photo Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[72]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[73]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[74]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[75]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[76]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[77]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[78]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[79]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[80]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[82]  Gregory Shakhnarovich,et al.  Style Transfer by Relaxed Optimal Transport and Self-Similarity , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Andrew Owens,et al.  Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[84]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[85]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[87]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[88]  Kwang In Kim,et al.  Improving Shape Deformation in Unsupervised Image-to-Image Translation , 2018, ECCV.

[89]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[91]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[92]  Smita Krishnaswamy,et al.  TraVeLGAN: Image-To-Image Translation by Transformation Vector Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).