Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols

Learning to communicate through interaction, rather than relying on explicit supervision, is often considered a prerequisite for developing a general AI. We study a setting where two agents engage in playing a referential game and, from scratch, develop a communication protocol necessary to succeed in this game. Unlike previous work, we require that messages they exchange, both at train and test time, are in the form of a language (i.e. sequences of discrete symbols). We compare a reinforcement learning approach and one using a differentiable relaxation (straight-through Gumbel-softmax estimator) and observe that the latter is much faster to converge and it results in more effective protocols. Interestingly, we also observe that the protocol we induce by optimizing the communication success exhibits a degree of compositionality and variability (i.e. the same information can be phrased in different ways), both properties characteristic of natural languages. As the ultimate goal is to ensure that communication is accomplished in natural language, we also perform experiments where we inject prior information about natural language into our model and study properties of the resulting protocol.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[4]  Kallirroi Georgila,et al.  Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.

[5]  L. A. Prashanth,et al.  Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods , 2012 .

[6]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Luc Steels,et al.  What triggers the emergence of grammar , 2005 .

[9]  Stefano Nolfi,et al.  Evolution of Communication and Language in Embodied Agents , 2009 .

[10]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[11]  Grzegorz Chrupala,et al.  Learning language through pictures , 2015, ACL.

[12]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[13]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14]  Phil Blunsom,et al.  Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[15]  M A Nowak,et al.  The evolution of language. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Yoshua Bengio,et al.  Memory Augmented Neural Networks with Wormhole Connections , 2017, ArXiv.

[17]  Emil Gustavsson,et al.  Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence , 2016, ArXiv.

[18]  Simon Kirby,et al.  Natural Language From Artificial Life , 2002, Artificial Life.

[19]  David Lewis Convention: A Philosophical Study , 1986 .

[20]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[21]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[22]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  James A. Reggia,et al.  Progress in the Simulation of Emergent Communication and Language , 2003, Adapt. Behav..

[24]  Chris Dyer,et al.  Semantic Parsing with Semi-Supervised Sequential Autoencoders , 2016, EMNLP.

[25]  Henry Brighton,et al.  Compositional Syntax From Cultural Transmission , 2002, Artificial Life.

[26]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[27]  Vittorio Loreto,et al.  Journal of Statistical Mechanics: An IOP and SISSA journal Theory and Experiment Sharp transition towardsshared vocabularies in multi-agent systems , 2006 .

[28]  Stanley Peters,et al.  Collaborative activities and multi-tasking in dialogue systems , 2002 .

[29]  Tomas Mikolov,et al.  A Roadmap Towards Machine Intelligence , 2015, CICLing.

[30]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[31]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Markus Werning,et al.  The Oxford Handbook of Compositionality , 2012 .

[33]  M. Studdert-Kennedy,et al.  Approaches to the Evolution of Language , 1999 .

[34]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[35]  Dan Klein,et al.  A Game-Theoretic Approach to Generating Spatial Descriptions , 2010, EMNLP.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Johanna D. Moore,et al.  Report on the First NLG Challenge on Generating Instructions in Virtual Environments (GIVE) , 2009, ENLG.

[38]  R. Kirk CONVENTION: A PHILOSOPHICAL STUDY , 1970 .

[39]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[40]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.