论文信息 - USIS: Unsupervised Semantic Image Synthesis

USIS: Unsupervised Semantic Image Synthesis

Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a photorealistic image is synthesized from a segmentation mask. SIS has mostly been addressed as a supervised problem. However, state-of-the-art methods depend on a huge amount of labeled data and cannot be applied in an unpaired setting. On the other hand, generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and feed them to traditional convolutional networks, which then learn correspondences in appearance instead of semantic content. In this initial work, we propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS) as a first step towards closing the performance gap between paired and unpaired settings. Notably, the framework deploys a SPADE generator that learns to output images with visually separable semantic classes using a self-supervised segmentation loss. Furthermore, in order to match the color and texture distribution of real images without losing highfrequency information, we propose to use whole image wavelet-based discrimination. We test our methodology on 3 challenging datasets and demonstrate its ability to generate multimodal photorealistic images with an improved quality in the unpaired setting.

[1] Vladlen Koltun,et al. Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[2] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Raja Bala,et al. Editing in Style: Uncovering the Local Semantics of GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Harris Drucker,et al. Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[6] Alexei A. Efros,et al. Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[7] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[8] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[10] Tieniu Tan,et al. Wavelet Domain Generative Adversarial Network for Multi-scale Face Hallucination , 2019, International Journal of Computer Vision.

[11] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12] Daniel Cohen-Or,et al. SWAGAN , 2021, ACM Trans. Graph..

[13] Thomas H. Li,et al. SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains , 2020, AAAI.

[14] Alexei A. Efros,et al. Contrastive Learning for Unpaired Image-to-Image Translation , 2020, ECCV.

[15] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Vladlen Koltun,et al. Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17] Seunghoon Hong,et al. Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Kwang In Kim,et al. Improving Shape Deformation in Unsupervised Image-to-Image Translation , 2018, ECCV.

[19] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[20] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[21] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Peter Wonka,et al. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[24] Kun Zhang,et al. Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Bernt Schiele,et al. You Only Need Adversarial Supervision for Semantic Image Synthesis , 2020, ICLR.

[26] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[29] Smita Krishnaswamy,et al. TraVeLGAN: Image-To-Image Translation by Transformation Vector Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Vladlen Koltun,et al. Semi-Parametric Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Robert Li,et al. Wavelet Pooling for Convolutional Neural Networks , 2018, ICLR.

[32] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[34] Chen Qian,et al. TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Concetto Spampinato,et al. Semi Supervised Semantic Segmentation Using Generative Adversarial Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Vittorio Ferrari,et al. COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Swami Sankaranarayanan,et al. Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Shanxin Yuan,et al. Wavelet-Based Dual-Branch Network for Image Demoireing , 2020, ECCV.

[40] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Greg Humphreys,et al. Physically Based Rendering: From Theory to Implementation , 2004 .

[42] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[43] Zhenan Sun,et al. Attribute-Aware Face Aging With Wavelet-Based Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Wangmeng Zuo,et al. Multi-Level Wavelet Convolutional Neural Networks , 2019, IEEE Access.

[49] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Thomas Brox,et al. Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[51] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[52] Nicu Sebe,et al. Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Nicu Sebe,et al. Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[54] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[55] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Jong Chul Ye,et al. A deep convolutional neural network using directional wavelets for low‐dose X‐ray CT reconstruction , 2016, Medical physics.

[57] Dumitru Erhan,et al. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Lior Wolf,et al. One-Sided Unsupervised Domain Mapping , 2017, NIPS.

[59] Eric P. Xing,et al. Generative Semantic Manipulation with Mask-Contrasting GAN , 2018, ECCV.

[60] Jung-Woo Ha,et al. Photorealistic Style Transfer via Wavelet Transforms , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61] Mai Xu,et al. Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video , 2020, ECCV.

[62] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[63] Tarik Dzanic,et al. Fourier Spectrum Discrepancies in Deep Network Generated Images , 2019, NeurIPS.

[64] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[65] Bernt Schiele,et al. A U-Net Based Discriminator for Generative Adversarial Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Philip Bachman,et al. Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[67] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[68] Qi Zhang,et al. Super-resolution reconstruction algorithms based on fusion of deep learning mechanism and wavelet , 2019, AIPR '19.

[69] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Maneesh Kumar Singh,et al. DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[71] Xiaogang Wang,et al. Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis , 2019, NeurIPS.

[72] Ingrid Daubechies,et al. The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[73] Margret Keuper,et al. Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Thomas A. Funkhouser,et al. Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Xing Gao,et al. A hybrid wavelet convolution network with sparse-coding for image super-resolution , 2016, 2016 IEEE International Conference on Image Processing (ICIP).