FACMAC: Factored Multi-Agent Centralised Policy Gradients

Centralised training with decentralised execution (CTDE) is an important learning paradigm in multi-agent reinforcement learning (MARL). To make progress in CTDE, we introduce Multi-Agent Mujoco, a novel benchmark suite that, unlike StarCraft II, the predominant benchmark environment, applies to continuous robotic control tasks. To demonstrate the utility of Multi-Agent Mujoco, we present a range of benchmark results on this new suite, including comparing the state-of-the-art actor-critic method MADDPG against two novel variants of existing methods. These new methods outperform MADDPG on several Multi-Agent Mujoco tasks. In addition, we show that factorisation is key to performance, but other algorithmic choices are not. This motivates the necessity of extending the study of value factorisations from $Q$-learning to actor-critic algorithms.

[1]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .

[2]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[3]  Jonathan P. How,et al.  R-MADDPG for Partially Observable Environments and Limited Communication , 2019, ArXiv.

[4]  Howie Choset,et al.  Design and architecture of the unified modular snake robot , 2012, 2012 IEEE International Conference on Robotics and Automation.

[5]  Risto Kojcev,et al.  Hierarchical Learning for Modular Robots , 2018, ArXiv.

[6]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[7]  Siva Kumar Balasundram,et al.  Research and development in agricultural robotics: a perspective of digital farming. , 2018 .

[8]  Shimon Whiteson,et al.  Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[9]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[10]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[11]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[12]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[13]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.

[14]  Hiroaki Kitano,et al.  RoboCup: A Challenge Problem for AI , 1997, AI Mag..

[15]  Leslie Pack Kaelbling,et al.  Automated Design of Adaptive Controllers for Modular Robots using Reinforcement Learning , 2008, Int. J. Robotics Res..

[16]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[17]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[18]  Andriy Mnih,et al.  Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.

[19]  Shimon Whiteson,et al.  Expected Policy Gradients , 2017, AAAI.

[20]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[21]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[23]  Shimon Whiteson,et al.  Weighted QMIX: Expanding Monotonic Value Function Factorisation , 2020, ArXiv.

[24]  David Hsu,et al.  POMDPs for robotic tasks with mixed observability , 2009, Robotics: Science and Systems.

[25]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[26]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[27]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[28]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[29]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[30]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[31]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[34]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[35]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[36]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[37]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[38]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[39]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[40]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[41]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[42]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[43]  Oussama Khatib,et al.  Springer Handbook of Robotics , 2007, Springer Handbooks.

[44]  Fabrizio Caccavale Cooperative Manipulators , 2015, Encyclopedia of Systems and Control.

[45]  Shimon Whiteson,et al.  Deep Coordination Graphs , 2020, ICML.

[46]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[47]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[48]  Shimon Whiteson,et al.  Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[49]  Ronald C. Arkin,et al.  Motor Schema — Based Mobile Robot Navigation , 1989, Int. J. Robotics Res..

[50]  Shinichi Nakasuka,et al.  A reinforcement learning approach to fail-safe design for multiple space robots—cooperation mechanism without communication and negotiation schemes , 2003, Adv. Robotics.

[51]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[52]  Sergei Lupashin,et al.  The Flight Assembled Architecture installation: Cooperative construction with flying machines , 2014, IEEE Control Systems.

[53]  Fabio Gramazio,et al.  Building tensile structures with flying machines , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[55]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[56]  Joseph A. Paradiso,et al.  ChainFORM: A Linear Integrated Modular Hardware System for Shape Changing Interfaces , 2016, UIST.

[57]  Yi Wu,et al.  Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[58]  Masashi Sugiyama,et al.  Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics , 2019, ArXiv.

[59]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[60]  Giovanni Montana,et al.  Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations , 2018, ArXiv.