论文信息 - Self-Supervised Sketch-to-Image Synthesis

Self-Supervised Sketch-to-Image Synthesis

Imagining a colored realistic image from an arbitrarily drawn sketch is one of the human capabilities that we eager machines to mimic. Unlike previous methods that either requires the sketch-image pairs or utilize low-quantity detected edges as sketches, we study the exemplar-based sketch-to-image (s2i) synthesis task in a self-supervised learning manner, eliminating the necessity of the paired sketch data. To this end, we first propose an unsupervised method to efficiently synthesize line-sketches for general RGB-only datasets. With the synthetic paired-data, we then present a self-supervised Auto-Encoder (AE) to decouple the content/style features from sketches and RGB-images, and synthesize images that are both content-faithful to the sketches and style-consistent to the RGB-images. While prior works employ either the cycle-consistence loss or dedicated attentional modules to enforce the content/style fidelity, we show AE's superior performance with pure self-supervisions. To further improve the synthesis quality in high resolution, we also leverage an adversarial network to refine the details of synthetic images. Extensive experiments on 1024*1024 resolution demonstrate a new state-of-art-art performance of the proposed model on CelebA-HQ and Wiki-Art datasets. Moreover, with the proposed sketch generator, the model shows a promising performance on style mixing and style transfer, which require synthesized images to be both style-consistent and semantically meaningful. Our code is available on this https URL, and please visit this https URL for an online demo of our model.

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Bingchen Liu,et al. Sketch-to-Art: Synthesizing Stylized Art Images From Sketches , 2020, ACCV.

[3] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Hiroshi Ishikawa,et al. Mastering Sketching: Adversarial Augmentation for Structured Prediction , 2017 .

[7] Dacheng Tao,et al. Self-Supervised Representation Learning by Rotation Feature Decoupling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Wei Liu,et al. Semi-Supervised Learning for Face Sketch Synthesis in the Wild , 2018, ACCV.

[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[11] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[13] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[15] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[16] Lu Yuan,et al. Cross-Domain Correspondence Learning for Exemplar-Based Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Alexander Kolesnikov,et al. Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] H. Robbins. A Stochastic Approximation Method , 1951 .

[19] M. Kramer. Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[20] Dustin Tran,et al. Deep and Hierarchical Implicit Models , 2017, ArXiv.

[21] Jaegul Choo,et al. Reference-Based Sketch Image Colorization Using Augmented-Self Reference and Dense Semantic Correspondence , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Hua Wang,et al. Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks , 2017, ArXiv.

[23] Bingchen Liu,et al. Finding Principal Semantics of Style in Art , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[24] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] James Hays,et al. SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Lingyun Wu,et al. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[29] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[30] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[32] Eli Shechtman,et al. Im2Pencil: Controllable Pencil Illustration From Photographs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[34] Jae Hyun Lim,et al. Geometric GAN , 2017, ArXiv.

[35] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[37] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[38] Qingming Huang,et al. Toward Realistic Face Photo–Sketch Synthesis via Composition-Aided GANs , 2017, IEEE Transactions on Cybernetics.

[39] Fisher Yu,et al. Scribbler: Controlling Deep Image Synthesis with Sketch and Color , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Minjae Kim,et al. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation , 2019, ICLR.

[41] Qian Yu,et al. An Unpaired Sketch-to-Photo Translation Model , 2019, ArXiv.

[42] Ahmed M. Elgammal,et al. CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms , 2017, ICCC.

[43] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[44] Mohamed Elhoseiny,et al. The Shape of Art History in the Eyes of the Machine , 2018, AAAI.

[45] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Maneesh Kumar Singh,et al. DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.