On the interaction between supervision and self-play in emergent communication

A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training. However, recent work suggests that current machine learning methods are too data inefficient to be trained in this way from scratch. In this paper, we investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency: imitating human language data via supervised learning, and maximizing reward in a simulated multi-agent environment via self-play (as done in emergent communication), and introduce the term supervised self-play (S2P) for algorithms using both of these signals. We find that first training agents via supervised learning on human data followed by self-play outperforms the converse, suggesting that it is not beneficial to emerge languages from scratch. We then empirically investigate various S2P schedules that begin with supervised learning in two environments: a Lewis signaling game with symbolic inputs, and an image-based referential game with natural language descriptions. Lastly, we introduce population based approaches to S2P, which further improves the performance over single-agent methods.

[1]  I. Arnon,et al.  Systematicity, but not compositionality: Examining the emergence of linguistic structure in children and adults using iterated learning , 2018, Cognition.

[2]  David Lewis Convention: A Philosophical Study , 1986 .

[3]  Alexander Peysakhovich,et al.  Learning Existing Social Conventions via Observationally Augmented Self-Play , 2018, AIES.

[4]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[7]  Kyunghyun Cho,et al.  Countering Language Drift via Visual Grounding , 2019, EMNLP.

[8]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[9]  Simon Kirby,et al.  Complex Systems in Language Evolution: the Cultural Emergence of Compositional Structure , 2003, Adv. Complex Syst..

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  Dan Klein,et al.  Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[12]  S. White Learning to Communicate , 1999, Architectural Research Quarterly.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Michael Cogswell,et al.  Emergence of Compositional Language with Deep Generational Transmission , 2019, ArXiv.

[15]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[16]  Michael Bowling,et al.  Ease-of-Teaching and Language Structure from Emergent Communication , 2019, NeurIPS.

[17]  S. Kirby,et al.  Iterated learning and the evolution of language , 2014, Current Opinion in Neurobiology.

[18]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[19]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[20]  Laura Graesser,et al.  Emergent Linguistic Phenomena in Multi-Agent Communication Games , 2019, EMNLP.

[21]  Jason Lee,et al.  Emergent Translation in Multi-Agent Communication , 2017, ICLR.

[22]  Angeliki Lazaridou,et al.  Multi-agent Communication meets Natural Language: Synergies between Functional and Structural Language Learning , 2020, ACL.

[23]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[24]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[25]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[26]  J. Morgan,et al.  Cheap Talk , 2005 .

[27]  Kyunghyun Cho,et al.  Emergent Communication in a Multi-Modal, Multi-Step Referential Game , 2017, ICLR.

[28]  Doina Precup,et al.  Shaping representations through communication , 2018 .

[29]  Andrew M. Dai,et al.  Capacity, Bandwidth, and Compositionality in Emergent Language Learning , 2020, AAMAS.

[30]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[31]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[32]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[33]  Eugene Kharitonov,et al.  Anti-efficient encoding in emergent communication , 2019, NeurIPS.

[34]  Stephen Clark,et al.  Emergent Communication through Negotiation , 2018, ICLR.

[35]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[36]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[37]  R. Kirk CONVENTION: A PHILOSOPHICAL STUDY , 1970 .

[38]  Mathijs Mul,et al.  Mastering emergent language: learning to guide in simulated navigation , 2019, ArXiv.

[39]  Jason Weston,et al.  Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.

[40]  Joelle Pineau,et al.  On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[41]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[42]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.