Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation

We propose an interactive GAN-based sketch-to-image translation method that helps novice users easily create images of simple objects. The user starts with a sparse sketch and a desired object category, and the network then recommends its plausible completion(s) and shows a corresponding synthesized image. This enables a feedback loop, where the user can edit the sketch based on the network's recommendations, while the network is able to better synthesize the image that the user might have in mind. In order to use a single model for a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network.

[1]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[2]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[3]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[7]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[8]  D. Cohen,et al.  Why can't most people draw what they see? , 1997, Journal of experimental psychology. Human perception and performance.

[9]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[10]  Oriol Vinyals,et al.  Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[11]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[14]  Ivan E. Sutherland,et al.  Sketch pad a man-machine graphical communication system , 1964, DAC.

[15]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Serge J. Belongie,et al.  Convolutional Networks with Adaptive Inference Graphs , 2017, International Journal of Computer Vision.

[17]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[18]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[19]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[20]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[21]  Philip H. S. Torr,et al.  Multi-agent Diverse Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Yong Jae Lee,et al.  ShadowDraw: real-time user guidance for freehand drawing , 2011, ACM Trans. Graph..

[23]  Eli Shechtman,et al.  Im2Pencil: Controllable Pencil Illustration From Photographs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[26]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[27]  James Hays,et al.  SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[30]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[31]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[32]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[33]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[34]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[35]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[36]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[37]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[38]  Ryan Schmidt,et al.  On expert performance in 3D curve-drawing tasks , 2009, SBIM '09.

[39]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[40]  K. Sasaki,et al.  Learning to simplify , 2016, ACM Trans. Graph..

[41]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[42]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[43]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Chuang Gan,et al.  Sparse, Smart Contours to Represent and Edit Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[46]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[47]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[48]  Marc Alexa,et al.  Photosketcher: Interactive Sketch-Based Image Synthesis , 2011, IEEE Computer Graphics and Applications.

[49]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[50]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[51]  Matthias Zwicker,et al.  Faceshop , 2018, ACM Trans. Graph..

[52]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[53]  Fisher Yu,et al.  Scribbler: Controlling Deep Image Synthesis with Sketch and Color , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[55]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[60]  Adam Finkelstein,et al.  Where do people draw lines? , 2008, ACM Trans. Graph..

[61]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[62]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[64]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.