Resilient Multi-Agent Reinforcement Learning with Adversarial Value Decomposition

We focus on resilience in cooperative multi-agent systems, where agents can change their behavior due to udpates or failures of hardware and software components. Current stateof-the-art approaches to cooperative multi-agent reinforcement learning (MARL) have either focused on idealized settings without any changes or on very specialized scenarios, where the number of changing agents is fixed, e.g., in extreme cases with only one productive agent. Therefore, we propose Resilient Adversarial value Decomposition with Antagonist-Ratios (RADAR). RADAR offers a value decomposition scheme to train competing teams of varying size for improved resilience against arbitrary agent changes. We evaluate RADAR in two cooperative multi-agent domains and show that RADAR achieves better worst case performance w.r.t. arbitrary agent changes than state-of-the-art MARL.

[1]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[2]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[3]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[4]  Yi Wu,et al.  Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[5]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[8]  Jan Wieghardt,et al.  Scenario co-evolution for reinforcement learning on a grid world smart factory domain , 2019, GECCO.

[9]  Yoav Shoham,et al.  A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[12]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[13]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[14]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[15]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[16]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[17]  Lenz Belzner,et al.  Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation , 2018, AAMAS.

[18]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[19]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[20]  Sergey Levine,et al.  Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.

[21]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  Scott M. Jordan,et al.  Evaluating the Performance of Reinforcement Learning Algorithms , 2020, ICML.

[24]  N. Le Fort-Piat,et al.  The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[25]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[26]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[27]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[28]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[29]  Kenneth O. Stanley,et al.  POET: open-ended coevolution of environments and their optimized solutions , 2019, GECCO.

[30]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[31]  Michael A. Goodrich,et al.  Learning to compete, compromise, and cooperate in repeated general-sum games , 2005, ICML.

[32]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[33]  Jan Wieghardt,et al.  Learning and Testing Resilience in Cooperative Multi-Agent Systems , 2020, AAMAS.

[34]  Pushmeet Kohli,et al.  Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures , 2018, ICLR.