Learning to Communicate Efficiently with Group Division in Decentralized Multi-agent Cooperation

Recent advances in multi-agent reinforcement learning show that agents can spontaneously learn when and what to communicate with each other to support effective cooperation. However, the existing approaches assume a fully-connected network with unlimited bandwidth, which is impractical in many real-world scenarios. For instance, in many multi-robot applications, robots are connected only through an unstable wireless network with limited bandwidth. Therefore, we must enable the agents to learn communication strategy that takes the consumption of network resources into account. This paper proposes a group division-based attentional communication model (GDAC), which can divide agents into groups according to their "attention" in the learned communication strategy. According to the novel "attention" mechanism, agents can be dynamically grouped according to their task relevance, and the communication only takes places inside the same group. As a result, it avoids a fully-connected communication architecture and can significantly reduce the bandwidth consumption at runtime. This model has been successfully applied to the environmental exploration task with a group of agents. The results show that GDAC could effectively reduce the total amount of communication message and yield improved performance over the existing fully-connected communication architecture.

[1]  Huaimin Wang,et al.  Deep Learning-based Cooperative Trail Following for Multi-Robot System , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[2]  Tiejian Luo,et al.  Learning to Communicate via Supervised Attentional Message Processing , 2018, CASA.

[3]  Shimon Whiteson,et al.  Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[4]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[5]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[7]  Rob Fergus,et al.  MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[10]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yedid Hoshen,et al.  VAIN: Attentional Multi-agent Predictive Modeling , 2017, NIPS.

[12]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[13]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Xing Zhou,et al.  Learning to Cooperate in Decentralized Multi-robot Exploration of Dynamic Environments , 2018, ICONIP.

[17]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[18]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.