论文信息 - Pictionary-Style Word Guessing on Hand-Drawn Object Sketches: Dataset, Analysis and Deep Network Models

Pictionary-Style Word Guessing on Hand-Drawn Object Sketches: Dataset, Analysis and Deep Network Models

The ability of intelligent agents to play games in human-like fashion is popularly considered a benchmark of progress in Artificial Intelligence. In our work, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, a guessing task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data. Sketch-QA involves asking a fixed question (“What object is being drawn?”) and gathering open-ended guess-words from human guessers. We analyze the resulting dataset and present many interesting findings therein. To mimic Pictionary-style guessing, we propose a deep neural model which generates guess-words in response to temporally evolving human-drawn object sketches. Our model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games.

Ravi Kiran Sarvadevabhatla | Shiv Surya | R. Venkatesh Babu | Trisha Mittal

[1] Mario Fritz,et al. Towards a Visual Turing Challenge , 2014, ArXiv.

[2] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[3] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4] Lior Wolf,et al. RNN Fisher Vectors for Action Recognition and Image Annotation , 2015, ECCV.

[5] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[6] 詹志禹. Response order effects in Likert-type scales , 1991 .

[7] Brent Kievit-Kylar,et al. The Semantic Pictionary Project , 2011, CogSci.

[8] Xinlei Chen,et al. Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.

[10] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.

[11] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[12] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14] Shimon Ullman,et al. Atoms of recognition in human and computer vision , 2016, Proceedings of the National Academy of Sciences.

[15] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.

[16] Martha Palmer,et al. Verb Semantics and Lexical Selection , 1994, ACL.

[17] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[18] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] András Kornai,et al. HunPos: an open source trigram tagger , 2007, ACL 2007.

[20] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[21] Stéphane Dupont,et al. DeepSketch: Deep convolutional neural networks for sketch recognition and similarity search , 2015, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI).

[22] James Hays,et al. The sketchy database , 2016, ACM Trans. Graph..

[23] Ravi Kiran Sarvadevabhatla,et al. Eye of the Dragon: Exploring Discriminatively Minimalist Sketch-based Abstractions for Object Categories , 2015, ACM Multimedia.

[24] Manolis Falelakis,et al. Improving video-mediated communication with orchestration , 2012, Comput. Hum. Behav..

[25] Pietro Perona,et al. Visual Recognition with Humans in the Loop , 2010, ECCV.

[26] Frans Mäyrä,et al. The Contextual Game Experience: On the Socio-Cultural Contexts for Meaning in Digital Play , 2007, DiGRA Conference.

[27] Michael A. Arbib,et al. How to Bootstrap a Human Communication System , 2013, Cogn. Sci..

[28] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[29] Tracey B. Wortham. Adapting Common Popular Games to a Human Factors/Ergonomics Course , 2006 .

[30] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[31] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[32] Dennis M. Dake,et al. The Visual Analysis of Visual Metaphor. , 1995 .

[33] Ravi Kiran Sarvadevabhatla,et al. Enabling My Robot To Play Pictionary: Recurrent Neural Networks For Sketch Recognition , 2016, ACM Multimedia.

[34] Li Fei-Fei,et al. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.

[35] Ellen Yi-Luen Do,et al. Games for sketch data collection , 2009, SBIM '09.

[36] Tao Xiang,et al. Sketch-a-Net that Beats Humans , 2015, BMVC.

[37] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38] Donald Geman,et al. Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.

[39] Stan Sclaroff,et al. Learning Activity Progression in LSTMs for Activity Detection and Early Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Marc Alexa,et al. How do humans sketch objects? , 2012, ACM Trans. Graph..

[41] Tinne Tuytelaars,et al. Sketch classification and classification-driven analysis using Fisher vectors , 2014, ACM Trans. Graph..

[42] Martha J. Farah,et al. Agnosia , 1992, Current Opinion in Neurobiology.

[43] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[44] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[45] Ning Liu,et al. Pictionary-based fMRI paradigm to study the neural correlates of spontaneous improvisation and figural creativity , 2015, Scientific Reports.

[46] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[47] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48] Tao Qin,et al. Query-level loss functions for information retrieval , 2008, Inf. Process. Manag..

[49] Ravi Kiran Sarvadevabhatla,et al. Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing , 2018, AAAI.

[50] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..