The Grammar of Emergent Languages

In this paper, we consider the syntactic properties of languages emerged in referential games, using unsupervised grammar induction (UGI) techniques originally designed to analyse natural language. We show that the considered UGI techniques are appropriate to analyse emergent languages and we then study if the languages that emerge in a typical referential game setup exhibit syntactic structure, and to what extent this depends on the maximum message length and number of symbols that the agents are allowed to use. Our experiments demonstrate that a certain message length and vocabulary size are required for structure to emerge, but they also illustrate that more sophisticated game scenarios are required to obtain syntactic properties more akin to those observed in human language. We argue that UGI techniques should be part of the standard toolkit for analysing emergent languages and release a comprehensive library to facilitate such analysis for future researchers.

[1]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[2]  Simon Kirby,et al.  Natural Language From Artificial Life , 2002, Artificial Life.

[3]  Yuchen Lu,et al.  Countering Language Drift with Seeded Iterated Learning , 2020, ICML.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Jason Baldridge,et al.  Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models , 2011, ACL.

[6]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[7]  Willem H. Zuidema,et al.  Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction , 2022 .

[8]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[9]  Jonathon S. Hare,et al.  Avoiding hashing and encouraging visual semantics in referential emergent language games , 2019, ArXiv.

[10]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[11]  Jacob Andreas,et al.  Measuring Compositionality in Representation Learning , 2019, ICLR.

[12]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[13]  Simon Kirby,et al.  Understanding Linguistic Evolution by Visualizing the Emergence of Topographic Mappings , 2006, Artificial Life.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Marco Baroni,et al.  How agents see things: On visual representations in an emergent language game , 2018, EMNLP.

[16]  Abhinav Gupta,et al.  Exploring Structural Inductive Biases in Emergent Communication , 2020, ArXiv.

[17]  Elia Bruni,et al.  Internal and External Pressures on Language Emergence: Least Effort, Object Constancy and Frequency , 2020, FINDINGS.

[18]  Michael Bowling,et al.  Ease-of-Teaching and Language Structure from Emergent Communication , 2019, NeurIPS.

[19]  Luc Steels,et al.  Modeling The Formation of Language in Embodied Agents: Methods and Open Challenges , 2010, Evolution of Communication and Language in Embodied Agents.

[20]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[21]  Elia Bruni,et al.  Compositional properties of emergent languages in deep learning , 2020, ArXiv.

[22]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[23]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[24]  Mathijs Mul,et al.  Mastering emergent language: learning to guide in simulated navigation , 2019, ArXiv.

[25]  Nando de Freitas,et al.  Compositional Obverter Communication Learning From Raw Visual Input , 2018, ICLR.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[28]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[29]  Ari Rappoport,et al.  Improved Fully Unsupervised Parsing with Zoomed Learning , 2010, EMNLP.

[30]  Dan Klein,et al.  Natural language grammar induction with a generative constituent-context model , 2005, Pattern Recognit..

[31]  Marco Baroni,et al.  Miss Tools and Mr Fruit: Emergent Communication in Agents Learning about Object Affordances , 2019, ACL.

[32]  Stephen Clark,et al.  Emergent Communication through Negotiation , 2018, ICLR.

[33]  Willem H. Zuidema,et al.  Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[34]  Simon Kirby,et al.  Language as an evolutionary system , 2005 .

[35]  Mohit Yadav,et al.  Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders , 2019, NAACL.

[36]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[37]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[38]  Chinmaya Devaraj,et al.  Towards Semantic Action Analysis via Emergent Language , 2019, 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR).

[39]  Simon Kirby,et al.  Compositional Languages Emerge in a Neural Iterated Learning Model , 2020, ICLR.