Multi-agent Communication meets Natural Language: Synergies between Functional and Structural Language Learning

We present a method for combining multi-agent communication and traditional data-driven approaches to natural language learning, with an end goal of teaching agents to communicate with humans in natural language. Our starting point is a language model that has been trained on generic, not task-specific language data. We then place this model in a multi-agent self-play environment that generates task-specific rewards used to adapt or modulate the model, turning it into a task-conditional language model. We introduce a new way for combining the two types of learning based on the idea of reranking language model samples, and show that this method outperforms others in communicating with humans in a visual referential communication task. Finally, we present a taxonomy of different types of language drift that can occur alongside a set of measures to detect them.

[1]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[2]  Kyunghyun Cho,et al.  Emergent Communication in a Multi-Modal, Multi-Step Referential Game , 2017, ICLR.

[3]  Christopher Potts,et al.  Learning in the Rational Speech Acts Model , 2015, ArXiv.

[4]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[7]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[8]  Alec Radford,et al.  Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.

[9]  Laura Graesser,et al.  Emergent Linguistic Phenomena in Multi-Agent Communication Games , 2019, EMNLP.

[10]  Ross B. Girshick,et al.  Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Csr Young,et al.  How to Do Things With Words , 2009 .

[12]  H. H. Clark,et al.  Conceptual pacts and lexical choice in conversation. , 1996, Journal of experimental psychology. Learning, memory, and cognition.

[13]  Eugene Kharitonov,et al.  Anti-efficient encoding in emergent communication , 2019, NeurIPS.

[14]  Kyunghyun Cho,et al.  Countering Language Drift via Visual Grounding , 2019, EMNLP.

[15]  Samy Bengio,et al.  Context-Aware Captions from Context-Agnostic Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Dan Klein,et al.  Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.

[17]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[18]  Dan Klein,et al.  Reasoning about Pragmatics with Neural Listeners and Speakers , 2016, EMNLP.

[19]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[21]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[22]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[23]  Marco Baroni,et al.  How agents see things: On visual representations in an emergent language game , 2018, EMNLP.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Bing Liu,et al.  Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning , 2018, NAACL.

[26]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  Yuchen Lu,et al.  Countering Language Drift with Seeded Iterated Learning , 2020, ICML.

[30]  Anca D. Dragan,et al.  On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.

[31]  Christopher Potts,et al.  Pragmatically Informative Image Captioning with Character-Level Inference , 2018, NAACL.

[32]  Bo Dai,et al.  Contrastive Learning for Image Captioning , 2017, NIPS.

[33]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[34]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[35]  YoungSteve,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006 .

[36]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[37]  Joelle Pineau,et al.  On the interaction between supervision and self-play in emergent communication , 2020, ICLR.

[38]  C. Lawrence Zitnick,et al.  Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40]  M. Engelmann The Philosophical Investigations , 2013 .