Learning to Draw: Emergent Communication through Sketching

Evidence that visual communication preceded written language and provided a basis for it goes back to prehistory, in forms such as cave and rock paintings depicting traces of our distant ancestors. Emergent communication research has sought to explore how agents can learn to communicate in order to collaboratively solve tasks. Existing research has focused on language, with a learned communication channel transmitting sequences of discrete tokens between the agents. In this work, we explore a visual communication channel between agents that are allowed to draw with simple strokes. Our agents are parameterised by deep neural networks, and the drawing procedure is differentiable, allowing for end-to-end training. In the framework of a referential communication game, we demonstrate that agents can not only successfully learn to communicate by drawing, but with appropriate inductive biases, can do so in a fashion that humans can interpret. We hope to encourage future research to consider visual communication as a more flexible and directly interpretable alternative of training collaborative agents.

[1]  A. M. Clerke Lost languages , 2002, Nature.

[2]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[3]  Shangmin Guo,et al.  Emergence of Numeric Concepts in Multi-Agent Autonomous Communication , 2019, ArXiv.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Joelle Pineau,et al.  TarMAC: Targeted Multi-Agent Communication , 2018, ICML.

[6]  Masoud Hashemi,et al.  The Analysis of Children's Drawings: Social, Emotional, Physical, and Psychological aspects , 2011 .

[7]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[9]  Li Fei-Fei,et al.  Simple line drawings suffice for functional MRI decoding of natural scene categories , 2011, Proceedings of the National Academy of Sciences.

[10]  D. C. Fox Prehistoric Rock Pictures in Europe and Africa , 1937 .

[11]  Luc Steels,et al.  The synthetic modeling of language origins , 1997 .

[12]  W. Strange Evolution of language. , 1984, JAMA.

[13]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[14]  I. J. Gelb A study of writing , 1954 .

[15]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[16]  Linda B. Smith,et al.  The importance of shape in early lexical learning , 1988 .

[17]  Joelle Pineau,et al.  On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[18]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[19]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[20]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Mike Wu,et al.  Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication , 2019, Computational Brain & Behavior.

[22]  R. Kirk CONVENTION: A PHILOSOPHICAL STUDY , 1970 .

[23]  Patrick Cavanagh,et al.  What Line Drawings Reveal About the Visual Brain , 2011, Front. Hum. Neurosci..

[24]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[25]  Yi Ren,et al.  Inductive Bias and Language Expressivity in Emergent Communication , 2020, ArXiv.

[26]  Eugene Kharitonov,et al.  Anti-efficient encoding in emergent communication , 2019, NeurIPS.

[27]  Stephen Clark,et al.  Emergent Communication through Negotiation , 2018, ICLR.

[28]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[29]  J. Zilhão,et al.  U-Th dating of carbonate crusts reveals Neandertal origin of Iberian cave art , 2018, Science.

[30]  Simon Osindero,et al.  From Language Games to Drawing Games , 2020, ArXiv.

[31]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[32]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[34]  Simon Kirby,et al.  Compositional Languages Emerge in a Neural Iterated Learning Model , 2020, ICLR.

[35]  Marco Baroni,et al.  Entropy Minimization In Emergent Languages , 2019, ICML.

[36]  Jonathon S. Hare,et al.  Differentiable Drawing and Sketching , 2021, ArXiv.

[37]  Eugene Kharitonov,et al.  Compositionality and Generalization In Emergent Languages , 2020, ACL.

[38]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[40]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[41]  Emil Gustavsson,et al.  Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence , 2016, ArXiv.