DoodleFormer: Creative Sketch Drawing with Transformers

Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in terms of Frèchet inception distance (FID) over the state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation and sketch completion.

[1]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Lei Xu,et al.  Sketch-pix2seq: a Model to Generate Sketches of Multiple Categories , 2017, ArXiv.

[3]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[4]  Oriol Vinyals,et al.  Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[5]  Yu-Gang Jiang,et al.  Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[9]  S. Srihari Mixture Density Networks , 1994 .

[10]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[11]  Vineeth N. Balasubramanian,et al.  Teaching GANs to sketch in vector format , 2019, ICVGIP.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Bodo Rosenhahn,et al.  Context-Aware Layout to Image Generation with Enhanced Object Appearance , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[15]  Fahad Shahbaz Khan,et al.  Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[16]  Pinaki Nath Chowdhury,et al.  SketchLattice: Latticed Representation for Sketch Manipulation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Wei Sun,et al.  Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis , 2020, ArXiv.

[18]  Shaogang Gong,et al.  Free-Hand Sketch Synthesis with Deformable Stroke Models , 2016, International Journal of Computer Vision.

[19]  Geoffrey E. Hinton,et al.  Inferring Motor Programs from Images of Handwritten Digits , 2005, NIPS.

[20]  Xin Yan,et al.  AI-Sketcher : A Deep Generative Model for Producing High-Quality Sketches , 2019, AAAI.

[21]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Jimei Yang,et al.  Learning to Doodle with Stroke Demonstrations and Deep Q-Networks , 2018, BMVC.

[24]  Wei Sun,et al.  Image Synthesis From Reconfigurable Layout and Style , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Kaiyue Pang,et al.  SketchHealer: A Graph-to-Sequence Network for Recreating Partial Human Sketches , 2020, BMVC.

[26]  Yuri Viazovetskyi,et al.  StyleGAN2 Distillation for Feed-forward Image Manipulation , 2020, ECCV.

[27]  C. Lawrence Zitnick,et al.  Creative Sketch Generation , 2020, ICLR.

[28]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[29]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Ningyuan Zheng,et al.  StrokeNet: A Neural Painting Environment , 2018, ICLR.

[31]  John Collomosse,et al.  Sketchformer: Transformer-Based Representation for Sketched Structure , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[33]  Lijuan Wang,et al.  Mesh Graphormer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).