论文信息 - Learning Multiagent Communication with Backpropagation

Learning Multiagent Communication with Backpropagation

Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their policy. We apply this model to a diverse set of tasks, demonstrating the ability of the agents to learn to communicate amongst themselves, yielding improved performance over non-communicative agents and baselines. In some cases, it is possible to interpret the language devised by the agents, revealing simple but effective strategies for solving the task at hand.

[1] Francisco S. Melo,et al. QueryPOMDP: POMDP-Based Communication in Multiagent Systems , 2011, EUMAS.

[2] A. Kamiya,et al. Learning of communication codes in multi-agent reinforcement learning problem , 2008, 2008 IEEE Conference on Soft Computing in Industrial Applications.

[3] Wolfram Burgard,et al. A Probabilistic Approach to Collaborative Multi-Robot Localization , 2000, Auton. Robots.

[4] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[5] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6] Richard S. Zemel,et al. Gated Graph Sequence Neural Networks , 2015, ICLR.

[7] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[10] Xiaofeng Wang,et al. Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[11] Victor R. Lesser,et al. Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[12] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[13] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[14] Reza Olfati-Saber,et al. Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[15] C. Lee Giles,et al. Learning Communication for Multi-agent Systems , 2002, WRAC.

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] David Silver,et al. Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[18] Wenwu Yu,et al. An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[19] Judea Pearl,et al. Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[20] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[21] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[22] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23] Shimon Whiteson,et al. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[24] Anibal Ollero,et al. Distributed Autonomous Robotic Systems 6 , 2007 .

[25] Kam-Fai Wong,et al. Towards Neural Network-based Reasoning , 2015, ArXiv.

[26] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[27] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[28] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.

[29] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[30] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.

[31] Javier de Lope Asiaín,et al. Coordination of communication in robot teams by reinforcement learning , 2011, Robotics Auton. Syst..

[32] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[33] Leslie Pack Kaelbling,et al. Efficient Distributed Reinforcement Learning through Agreement , 2008, DARS.

[34] Rob Fergus,et al. MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[35] Manuela M. Veloso,et al. Towards collaborative and adversarial learning: a case study in robotic soccer , 1998, Int. J. Hum. Comput. Stud..

[36] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[37] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[38] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.