Scones: towards conversational authoring of sketches

Iteratively refining and critiquing sketches are crucial steps to developing effective designs. We introduce Scones, a mixed-initiative, machine-learning-driven system that enables users to iteratively author sketches from text instructions. Scones is a novel deep-learning-based system that iteratively generates scenes of sketched objects composed with semantic specifications from natural language. Scones exceeds state-of-the-art performance on a text-based scene modification task, and introduces a mask-conditioned sketching model that can generate sketches with poses specified by high-level scene information. In an exploratory user evaluation of Scones, participants reported enjoying an iterative drawing task with Scones, and suggested additional features for further applications. We believe Scones is an early step towards automated, intelligent systems that support human-in-the-loop applications for communicating ideas through sketching in art and design.

[1]  Alexei A. Efros,et al.  Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  James A. Landay,et al.  SILK: sketching interfaces like krazy , 1996, CHI Conference Companion.

[3]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[5]  Chen Fang,et al.  Visually-Aware Fashion Recommendation and Design with Generative Image Models , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[6]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[7]  Xinlei Chen,et al.  CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication , 2017, ACL.

[8]  Björn Hartmann,et al.  SceneSkim: Searching and Browsing Movies Using Synchronized Captions, Scripts and Plot Summaries , 2015, UIST.

[9]  Adrien Treuille,et al.  Real-time drawing assistance through crowdsourcing , 2013, HCOMP.

[10]  Ersin Yumer,et al.  Photo-Sketching: Inferring Contour Drawings From Images , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  John F. Canny,et al.  Sketchforme: Composing Sketched Scenes from Text Descriptions for Interactive Applications , 2019, UIST.

[12]  Ranjitha Kumar,et al.  Designing the Future of Personal Fashion , 2018, CHI.

[13]  Regina Barzilay,et al.  Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2017, ACL 2017.

[14]  Takeo Igarashi,et al.  Sketch-editing games: human-machine communication, game theory and applications , 2012, UIST.

[15]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[16]  Ali Farhadi,et al.  From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[18]  Gierad Laput,et al.  PixelTone: a multimodal interface for image editing , 2013, CHI.

[19]  Henry Lieberman,et al.  What am I gonna wear?: scenario-oriented recommendation , 2007, IUI '07.

[20]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Rubaiat Habib Kazi,et al.  Kitty: sketching dynamic and interactive illustrations , 2014, UIST.

[23]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[24]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[25]  James Hays,et al.  The sketchy database , 2016, ACM Trans. Graph..

[26]  Yoav Artzi,et al.  A Corpus of Natural Language for Visual Reasoning , 2017, ACL.

[27]  David J. Fleet,et al.  VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.

[28]  Tao Xiang,et al.  SketchyScene: Richly-Annotated Scene Sketches , 2018, ECCV.

[29]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[31]  Seunghoon Hong,et al.  Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[33]  Jeffrey Nichols,et al.  Swire: Sketch-based User Interface Retrieval , 2019, CHI.

[34]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[35]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[36]  Alex J. Champandard,et al.  Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks , 2016, ArXiv.