论文信息 - On the Pitfalls of Measuring Emergent Communication - 字舞流文

On the Pitfalls of Measuring Emergent Communication

How do we know if communication is emerging in a multi-agent system? The vast majority of recent papers on emergent communication show that adding a communication channel leads to an increase in reward or task success. This is a useful indicator, but provides only a coarse measure of the agent's learned communication abilities. As we move towards more complex environments, it becomes imperative to have a set of finer tools that allow qualitative and quantitative insights into the emergence of communication. This may be especially useful to allow humans to monitor agents' behaviour, whether for fault detection, assessing performance, or even building trust. In this paper, we examine a few intuitive existing metrics for measuring communication, and show that they can be misleading. Specifically, by training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, we find a scenario where agents appear to communicate (their messages provide information about their subsequent action), and yet the messages do not impact the environment or other agent in any way. We explain this phenomenon using ablation studies and by visualizing the representations of the learned policies. We also survey some commonly used metrics for measuring emergent communication, and provide recommendations as to when these metrics should be used.

Joelle Pineau | Y-Lan Boureau | Yann Dauphin | Jakob N. Foerster | Ryan Lowe | Joelle Pineau | Yann Dauphin | Y-Lan Boureau | Ryan Lowe | Y. Dauphin

[1] J. A. López del Val,et al. Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

[2] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[3] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4] John Maynard Smith. Honest signalling: the Philip Sidney game , 1991, Animal Behaviour.

[5] José M. F. Moura,et al. Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[6] Alexander Peysakhovich,et al. Learning Social Conventions in Markov Games , 2018, ArXiv.

[7] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[8] J. Pearl,et al. Causal Inference in Statistics: A Primer , 2016 .

[9] David F. Sally. Conversation and Cooperation in Social Dilemmas , 1995 .

[10] Alexander Peysakhovich,et al. Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[11] Kyunghyun Cho,et al. Emergent Communication in a Multi-Modal, Multi-Step Referential Game , 2017, ICLR.

[12] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] James A. Reggia,et al. Progress in the Simulation of Emergent Communication and Language , 2003, Adapt. Behav..

[14] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[15] Kevin J. S. Zollman,et al. Evolutionary dynamics of Lewis signaling games: signaling systems vs. partial pooling , 2007, Synthese.

[16] Nando de Freitas,et al. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[17] W. Strange. Evolution of language. , 1984, JAMA.

[18] G. Dunteman. Principal Components Analysis , 1989 .

[19] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[21] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[22] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[23] Stephen Clark,et al. Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[24] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[25] Ivan Titov,et al. Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[26] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[27] Jonathan Berant,et al. Emergence of Communication in an Interactive World with Consistent Speakers , 2018, ArXiv.

[28] H. Gintis,et al. Costly signaling and cooperation. , 2001, Journal of theoretical biology.

[29] Nando de Freitas,et al. Intrinsic Social Motivation via Causal Influence in Multi-Agent RL , 2018, ArXiv.

[30] A. Zahavi. Mate selection-a selection for a handicap. , 1975, Journal of theoretical biology.

[31] Nando de Freitas,et al. Compositional Obverter Communication Learning From Raw Visual Input , 2018, ICLR.

[32] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[33] Shane Legg,et al. Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings , 2019, ArXiv.

[34] Igor Mordatch,et al. A Paradigm for Situated and Goal-Driven Language Learning , 2016, ArXiv.

[35] Alexander Peysakhovich,et al. Learning Existing Social Conventions via Observationally Augmented Self-Play , 2018, AIES.

[36] Yoshua Bengio,et al. Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[37] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[38] Shimon Whiteson,et al. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[39] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[40] Stephen Clark,et al. Emergent Communication through Negotiation , 2018, ICLR.

[41] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[42] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[43] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44] H. Francis Song,et al. Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[45] R. Aumann. Subjectivity and Correlation in Randomized Strategies , 1974 .

[46] J. Morgan,et al. Cheap Talk , 2005 .