USIS: Unsupervised Semantic Image Synthesis

Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a photorealistic image is synthesized from a segmentation mask. SIS has mostly been addressed as a supervised problem. However, state-of-the-art methods depend on a huge amount of labeled data and cannot be applied in an unpaired setting. On the other hand, generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and feed them to traditional convolutional networks, which then learn correspondences in appearance instead of semantic content. In this initial work, we propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS) as a first step towards closing the performance gap between paired and unpaired settings. Notably, the framework deploys a SPADE generator that learns to output images with visually separable semantic classes using a self-supervised segmentation loss. Furthermore, in order to match the color and texture distribution of real images without losing highfrequency information, we propose to use whole image wavelet-based discrimination. We test our methodology on 3 challenging datasets and demonstrate its ability to generate multimodal photorealistic images with an improved quality in the unpaired setting.

[1]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[2]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Raja Bala,et al.  Editing in Style: Uncovering the Local Semantics of GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[6]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[7]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[8]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[10]  Tieniu Tan,et al.  Wavelet Domain Generative Adversarial Network for Multi-scale Face Hallucination , 2019, International Journal of Computer Vision.

[11]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Daniel Cohen-Or,et al.  SWAGAN , 2021, ACM Trans. Graph..

[13]  Thomas H. Li,et al.  SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains , 2020, AAAI.

[14]  Alexei A. Efros,et al.  Contrastive Learning for Unpaired Image-to-Image Translation , 2020, ECCV.

[15]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Seunghoon Hong,et al.  Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Kwang In Kim,et al.  Improving Shape Deformation in Unsupervised Image-to-Image Translation , 2018, ECCV.

[19]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[20]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[21]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Peter Wonka,et al.  SEAN: Image Synthesis With Semantic Region-Adaptive Normalization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[24]  Kun Zhang,et al.  Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bernt Schiele,et al.  You Only Need Adversarial Supervision for Semantic Image Synthesis , 2020, ICLR.

[26]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[29]  Smita Krishnaswamy,et al.  TraVeLGAN: Image-To-Image Translation by Transformation Vector Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Vladlen Koltun,et al.  Semi-Parametric Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Robert Li,et al.  Wavelet Pooling for Convolutional Neural Networks , 2018, ICLR.

[32]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[34]  Chen Qian,et al.  TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Concetto Spampinato,et al.  Semi Supervised Semantic Segmentation Using Generative Adversarial Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Swami Sankaranarayanan,et al.  Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Shanxin Yuan,et al.  Wavelet-Based Dual-Branch Network for Image Demoireing , 2020, ECCV.

[40]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .

[42]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[43]  Zhenan Sun,et al.  Attribute-Aware Face Aging With Wavelet-Based Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Wangmeng Zuo,et al.  Multi-Level Wavelet Convolutional Neural Networks , 2019, IEEE Access.

[49]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[51]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[52]  Nicu Sebe,et al.  Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Nicu Sebe,et al.  Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[54]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[55]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jong Chul Ye,et al.  A deep convolutional neural network using directional wavelets for low‐dose X‐ray CT reconstruction , 2016, Medical physics.

[57]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Lior Wolf,et al.  One-Sided Unsupervised Domain Mapping , 2017, NIPS.

[59]  Eric P. Xing,et al.  Generative Semantic Manipulation with Mask-Contrasting GAN , 2018, ECCV.

[60]  Jung-Woo Ha,et al.  Photorealistic Style Transfer via Wavelet Transforms , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Mai Xu,et al.  Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video , 2020, ECCV.

[62]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[63]  Tarik Dzanic,et al.  Fourier Spectrum Discrepancies in Deep Network Generated Images , 2019, NeurIPS.

[64]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[65]  Bernt Schiele,et al.  A U-Net Based Discriminator for Generative Adversarial Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Philip Bachman,et al.  Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[67]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[68]  Qi Zhang,et al.  Super-resolution reconstruction algorithms based on fusion of deep learning mechanism and wavelet , 2019, AIPR '19.

[69]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[71]  Xiaogang Wang,et al.  Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis , 2019, NeurIPS.

[72]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[73]  Margret Keuper,et al.  Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Xing Gao,et al.  A hybrid wavelet convolution network with sparse-coding for image super-resolution , 2016, 2016 IEEE International Conference on Image Processing (ICIP).