Learning Efficient Multi-agent Communication: An Information Bottleneck Approach

We consider the problem of the limited-bandwidth communication for multi-agent reinforcement learning, where agents cooperate with the assistance of a communication protocol and a scheduler. The protocol and scheduler jointly determine which agent is communicating what message and to whom. Under the limited bandwidth constraint, a communication protocol is required to generate informative messages. Meanwhile, an unnecessary communication connection should not be established because it occupies limited resources in vain. In this paper, we develop an Informative Multi-Agent Communication (IMAC) method to learn efficient communication protocols as well as scheduling. First, from the perspective of communication theory, we prove that the limited bandwidth constraint requires low-entropy messages throughout the transmission. Then inspired by the information bottleneck principle, we learn a valuable and compact communication protocol and a weight-based scheduler. To demonstrate the efficiency of our method, we conduct extensive experiments in various cooperative and competitive multi-agent tasks with different numbers of agents and different bandwidths. We show that IMAC converges faster and leads to efficient communication among agents under the limited bandwidth as compared to many baseline methods.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Tiejian Luo,et al.  Learning to Communicate via Supervised Attentional Message Processing , 2018, CASA.

[3]  Ohad Shamir,et al.  Learning and generalization with the information bottleneck , 2008, Theor. Comput. Sci..

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Giovanni Montana,et al.  Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations , 2018, ArXiv.

[6]  Joelle Pineau,et al.  On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[7]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[8]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[9]  Javier de Lope Asiaín,et al.  Coordination of communication in robot teams by reinforcement learning , 2011, Robotics Auton. Syst..

[10]  Daniel Cressey,et al.  Ocean-diving robot Nereus will not be replaced , 2015, Nature.

[11]  Francisco S. Melo,et al.  QueryPOMDP: POMDP-Based Communication in Multiagent Systems , 2011, EUMAS.

[12]  Sergey Levine,et al.  Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.

[13]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[14]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[15]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[16]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[17]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[18]  Kazuya Yoshida,et al.  Emergency response to the nuclear accident at the Fukushima Daiichi Nuclear Power Plants using mobile rescue robots , 2013, J. Field Robotics.

[19]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[20]  Joelle Pineau,et al.  TarMAC: Targeted Multi-Agent Communication , 2018, ICML.

[21]  Taeyoung Lee,et al.  Learning to Schedule Communication in Multi-agent Reinforcement Learning , 2019, ICLR.

[22]  Roger L. Freeman,et al.  Telecommunication System Engineering , 1980 .

[23]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[24]  Javier de Lope Asiaín,et al.  Coordination of communication in robot teams by reinforcement learning , 2013, Robotics Auton. Syst..

[25]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[26]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[27]  Victor R. Lesser,et al.  Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[28]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[29]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[30]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[31]  Steve Chien,et al.  Review on space robotics: Toward top-level science through space exploration , 2017, Science Robotics.

[32]  Zhen Xiao,et al.  Learning Multi-agent Communication under Limited-bandwidth Restriction for Internet Packet Routing , 2019, ArXiv.

[33]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[34]  Amanpreet Singh,et al.  Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks , 2018, ICLR.

[35]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[36]  Sergey Levine,et al.  InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.

[37]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[38]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.