Iterative Update and Unified Representation for Multi-Agent Reinforcement Learning

Multi-agent systems have a wide range of applications in cooperative and competitive tasks. As the number of agents increases, nonstationarity gets more serious in multi-agent reinforcement learning (MARL), which brings great difficulties to the learning process. Besides, current mainstream algorithms configure each agent an independent network,so that the memory usage increases linearly with the number of agents which greatly slows down the interaction with the environment. Inspired by Generative Adversarial Networks (GAN), this paper proposes an iterative update method (IU) to stabilize the nonstationary environment. Further, we add first-person perspective and represent all agents by only one network which can change agents' policies from sequential compute to batch compute. Similar to continual lifelong learning, we realize the iterative update method in this unified representative network (IUUR). In this method, iterative update can greatly alleviate the nonstationarity of the environment, unified representation can speed up the interaction with environment and avoid the linear growth of memory usage. Besides, this method does not bother decentralized execution and distributed deployment. Experiments show that compared with MADDPG, our algorithm achieves state-of-the-art performance and saves wall-clock time by a large margin especially with more agents.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[3]  Peter Vrancx,et al.  Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.

[4]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[5]  Chao Gao,et al.  Continual Match Based Training in Pommerman: Technical Report , 2018, ArXiv.

[6]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[9]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[10]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[11]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[12]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[13]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[14]  S. Peng A Generalized dynamic programming principle and hamilton-jacobi-bellman equation , 1992 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[17]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[18]  Andrew W. Moore,et al.  Reinforcement Learning for Cooperating and Communicating Reactive Agents in Electrical Power Grids , 2000, Balancing Reactivity and Social Deliberation in Multi-Agent Systems.

[19]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[20]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[21]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[22]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[23]  Guillermo Ricardo Simari,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 2000 .

[24]  Nikos Vlassis,et al.  A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence I Mobk077-fm Synthesis Lectures on Artificial Intelligence and Machine Learning a Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence a Concise Introduction to Multiagent Systems and D , 2007 .

[25]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[26]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.