Global and Local Alignment Networks for Unpaired Image-to-Image Translation

The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain’s style while keeping unrelated contents of the input source image unchanged. However, due to the lack of attention to the content change in existing methods, the semantic information from source images suffers from degradation during translation. In the paper, to address this issue, we introduce a novel approach, Global and Local Alignment Networks (GLA-Net). The global alignment network aims to transfer the input image from the source domain to the target domain. To effectively do so, we learn the parameters (mean and standard deviation) of multivariate Gaussian distributions as style features by using an MLP-Mixer based style encoder. To transfer the style more accurately, we employ an adaptive instance normalization layer in the encoder, with the parameters of the target multivariate Gaussian distribution as input. We also adopt regularization and likelihood losses to further reduce the domain gap and produce high-quality outputs. Additionally, we introduce a local alignment network, which employs a pretrained self-supervised model to produce an attention map via a novel local alignment loss, ensuring that the translation network focuses on relevant pixels. Extensive experiments conducted on five public datasets demonstrate that our method effectively generates sharper and more realistic images than existing approaches. Our code is available at https://github.com/ygjwd12345/GLANet.

[1]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[2]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[4]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Jinwoo Shin,et al.  InstaGAN: Instance-aware Image-to-Image Translation , 2018, ICLR.

[6]  Nicu Sebe,et al.  AttentionGAN: Unpaired Image-to-Image Translation Using Attention-Guided Generative Adversarial Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[8]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nathan Jacobs,et al.  Unifying Guided and Unguided Outdoor Image Synthesis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[11]  Jianfei Cai,et al.  The Spatially-Correlative Loss for Various Image Translation Tasks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Chen Qian,et al.  TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[15]  Nicu Sebe,et al.  Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alexander Kolesnikov,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[18]  Alexei A. Efros,et al.  Multimodal Image-to-Image Translation by Enforcing Bi-Cycle Consistency , 2017, NIPS 2017.

[19]  Dacheng Tao,et al.  Attention-GAN for Object Transfiguration in Wild Images , 2018, ECCV.

[20]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiaofeng Tao,et al.  Transient attributes for high-level understanding and editing of outdoor scenes , 2014, ACM Trans. Graph..

[22]  Peter Wonka,et al.  SEAN: Image Synthesis With Semantic Region-Adaptive Normalization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Nicu Sebe,et al.  Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Kun Zhang,et al.  Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[26]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Bin Fang,et al.  Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[30]  Alexei A. Efros,et al.  Contrastive Learning for Unpaired Image-to-Image Translation , 2020, ECCV.

[31]  Mingming Gong,et al.  Unaligned Image-to-Image Translation by Learning to Reweight , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Luc Van Gool,et al.  Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[33]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[35]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[36]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Yu-Ding Lu,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2020, International Journal of Computer Vision.

[39]  Lior Wolf,et al.  One-Sided Unsupervised Domain Mapping , 2017, NIPS.

[40]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Luc Van Gool,et al.  Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Seong Joon Oh,et al.  Reliable Fidelity and Diversity Metrics for Generative Models , 2020, ICML.

[44]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[45]  Eric P. Xing,et al.  Generative Semantic Manipulation with Contrasting GAN , 2017, ArXiv.

[46]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[47]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Chao Yang,et al.  Show, Attend, and Translate: Unsupervised Image Translation With Self-Regularization and Attention , 2018, IEEE Transactions on Image Processing.

[49]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50]  Xuguang Lai,et al.  Unsupervised Generative Adversarial Networks with Cross-model Weight Transfer Mechanism for Image-to-image Translation , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[51]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[52]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Tao Xiang,et al.  Domain Generalization with MixStyle , 2021, ICLR.

[57]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Minjae Kim,et al.  U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation , 2019, ICLR.