Emergent Coordination Through Competition

We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents' behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation. Our study highlights several of the challenges encountered in large scale multi-agent training in continuous control. In particular, we demonstrate that the automatic optimization of simple shaping rewards, not themselves conducive to co-operative behavior, can lead to long-horizon team behavior. We further apply an evaluation scheme, grounded by game theoretic principals, that can assess agent performance in the absence of pre-defined evaluation tasks or human baselines.

[1]  L. Shapley Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1959, IBM J. Res. Dev..

[3]  A. Elo The rating of chessplayers, past and present , 1978 .

[4]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Hiroaki Kitano,et al.  RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[8]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[9]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[10]  Andrew Y. Ng,et al.  On Local Rewards and Scaling Distributed Reinforcement Learning , 2005, NIPS.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[12]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[13]  Richard L. Lewis,et al.  Optimal rewards in multiagent teams , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[14]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[16]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2015, ICLR.

[18]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[19]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[20]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[21]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[24]  Matthew Hausknecht Cooperation and communication in multiagent deep reinforcement learning , 2016 .

[25]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[26]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[27]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[28]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[29]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[30]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[31]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[32]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[33]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[34]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[35]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[36]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[37]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[38]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[39]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[40]  Joel Z. Leibo,et al.  A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.

[41]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2018, ICLR.

[42]  Joel Z. Leibo,et al.  Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.

[43]  Patrick MacAlpine,et al.  Overlapping layered learning , 2018, Artif. Intell..

[44]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[45]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2018, ICLR.

[46]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[47]  Thore Graepel,et al.  Re-evaluating evaluation , 2018, NeurIPS.