Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Centralised training with decentralised execution (CTDE) is an important learning paradigm in multi-agent reinforcement learning (MARL). To make progress in CTDE, we introduce Multi-Agent Mujoco, a novel benchmark suite that, unlike StarCraft II, the predominant benchmark environment, applies to continuous robotic control tasks. To demonstrate the utility of Multi-Agent Mujoco, we present a range of benchmark results on this new suite, including comparing the state-of-the-art actor-critic method MADDPG against two novel variants of existing methods. These new methods outperform MADDPG on several Multi-Agent Mujoco tasks. In addition, we show that factorisation is key to performance, but other algorithmic choices are not. This motivates the necessity of extending the study of value factorisations from $Q$-learning to actor-critic algorithms.

[1]  H. W. Kuhn EXTENSIVE GAMES AND THE PROBLEM OF INFORMATION , 2020, Classics in Game Theory.

[2]  Shimon Whiteson,et al.  Weighted QMIX: Expanding Monotonic Value Function Factorisation , 2020, ArXiv.

[3]  Shimon Whiteson,et al.  Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[4]  Andriy Mnih,et al.  Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.

[5]  S. Whiteson,et al.  Deep Coordination Graphs , 2019, ICML.

[6]  Jonathan P. How,et al.  R-MADDPG for Partially Observable Environments and Limited Communication , 2019, ArXiv.

[7]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[8]  Masashi Sugiyama,et al.  Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics , 2019, ArXiv.

[9]  Yi Wu,et al.  Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[10]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[11]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.

[12]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[13]  Shimon Whiteson,et al.  Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[14]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[15]  Giovanni Montana,et al.  Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations , 2018, ArXiv.

[16]  Siva Kumar Balasundram,et al.  Research and development in agricultural robotics: a perspective of digital farming. , 2018 .

[17]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[18]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[19]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[20]  Risto Kojcev,et al.  Hierarchical Learning for Modular Robots , 2018, ArXiv.

[21]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[22]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[23]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[24]  Shimon Whiteson,et al.  Expected Policy Gradients , 2017, AAAI.

[25]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[26]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[27]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[28]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[29]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[30]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[31]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[32]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[33]  Joseph A. Paradiso,et al.  ChainFORM: A Linear Integrated Modular Hardware System for Shape Changing Interfaces , 2016, UIST.

[34]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[35]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[36]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[37]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[38]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[39]  O. Khatib,et al.  Springer Handbook of Robotics , 2008 .

[40]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Sergei Lupashin,et al.  The Flight Assembled Architecture installation: Cooperative construction with flying machines , 2014, IEEE Control Systems.

[43]  Fabio Gramazio,et al.  Building tensile structures with flying machines , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45]  Howie Choset,et al.  Design and architecture of the unified modular snake robot , 2012, 2012 IEEE International Conference on Robotics and Automation.

[46]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[47]  David Hsu,et al.  POMDPs for robotic tasks with mixed observability , 2009, Robotics: Science and Systems.

[48]  F. Caccavale Cooperative Manipulators , 2019, Encyclopedia of Systems and Control.

[49]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[50]  Leslie Pack Kaelbling,et al.  Automated Design of Adaptive Controllers for Modular Robots using Reinforcement Learning , 2008, Int. J. Robotics Res..

[51]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[52]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[53]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[54]  Shinichi Nakasuka,et al.  A reinforcement learning approach to fail-safe design for multiple space robots—cooperation mechanism without communication and negotiation schemes , 2003, Adv. Robotics.

[55]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[56]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[57]  Hiroaki Kitano,et al.  RoboCup: A Challenge Problem for AI , 1997, AI Mag..

[58]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[59]  Ronald C. Arkin,et al.  Motor Schema — Based Mobile Robot Navigation , 1989, Int. J. Robotics Res..

[60]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[61]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .