论文信息 - DoodleFormer: Creative Sketch Drawing with Transformers

DoodleFormer: Creative Sketch Drawing with Transformers

Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in terms of Frèchet inception distance (FID) over the state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation and sketch completion.

[1] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Lei Xu,et al. Sketch-pix2seq: a Model to Generate Sketches of Multiple Categories , 2017, ArXiv.

[3] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[4] Oriol Vinyals,et al. Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[5] Yu-Gang Jiang,et al. Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[9] S. Srihari. Mixture Density Networks , 1994 .

[10] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[11] Vineeth N. Balasubramanian,et al. Teaching GANs to sketch in vector format , 2019, ICVGIP.

[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[13] Bodo Rosenhahn,et al. Context-Aware Layout to Image Generation with Enhanced Object Appearance , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[15] Fahad Shahbaz Khan,et al. Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[16] Pinaki Nath Chowdhury,et al. SketchLattice: Latticed Representation for Sketch Manipulation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17] Wei Sun,et al. Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis , 2020, ArXiv.

[18] Shaogang Gong,et al. Free-Hand Sketch Synthesis with Deformable Stroke Models , 2016, International Journal of Computer Vision.

[19] Geoffrey E. Hinton,et al. Inferring Motor Programs from Images of Handwritten Digits , 2005, NIPS.

[20] Xin Yan,et al. AI-Sketcher : A Deep Generative Model for Producing High-Quality Sketches , 2019, AAAI.

[21] Douglas Eck,et al. A Neural Representation of Sketch Drawings , 2017, ICLR.

[22] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23] Jimei Yang,et al. Learning to Doodle with Stroke Demonstrations and Deep Q-Networks , 2018, BMVC.

[24] Wei Sun,et al. Image Synthesis From Reconfigurable Layout and Style , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Kaiyue Pang,et al. SketchHealer: A Graph-to-Sequence Network for Recreating Partial Human Sketches , 2020, BMVC.

[26] Yuri Viazovetskyi,et al. StyleGAN2 Distillation for Feed-forward Image Manipulation , 2020, ECCV.

[27] C. Lawrence Zitnick,et al. Creative Sketch Generation , 2020, ICLR.

[28] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[29] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[30] Ningyuan Zheng,et al. StrokeNet: A Neural Painting Environment , 2018, ICLR.

[31] John Collomosse,et al. Sketchformer: Transformer-Based Representation for Sketched Structure , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[33] Lijuan Wang,et al. Mesh Graphormer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).