论文信息 - Learning and Testing Resilience in Cooperative Multi-Agent Systems - 字舞流文

Learning and Testing Resilience in Cooperative Multi-Agent Systems

State-of-the-art multi-agent reinforcement learning has achieved remarkable success in recent years. The success has been mainly based on the assumption that all teammates perfectly cooperate to optimize a global objective in order to achieve a common goal. While this may be true in the ideal case, these approaches could fail in practice, since in multi-agent systems (MAS), all agents may be a potential source of failure. In this paper, we focus on resilience in cooperative MAS and propose an Antagonist-Ratio Training Scheme (ARTS) by reformulating the original target MAS as a mixed cooperative-competitive game between a group of protagonists which represent agents of the target MAS and a group of antagonists which represent failures in the MAS. While the protagonists can learn robust policies to ensure resilience against failures, the antagonists can learn malicious behavior to provide an adequate test suite for other MAS. We empirically evaluate ARTS in a cyber physical production domain and show the effectiveness of ARTS w.r.t. resilience and testing capabilities.

Jan Wieghardt | Horst Sauer | Marc Zeller | Thomas Gabor | Claudia Linnhoff-Popien | Thomy Phan | Andreas Sedlmeier | Fabian Ritz | Reiner Schmid | Bernhard Kempter | Cornel Klein | Reiner N. Schmid | C. Linnhoff-Popien | Thomy Phan | J. Wieghardt | Thomas Gabor | Fabian Ritz | C. Klein | M. Zeller | Andreas Sedlmeier | Horst Sauer | B. Kempter

[1] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[2] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[3] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[4] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[5] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[6] Sergey Levine,et al. Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.

[7] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] Myra B. Cohen,et al. An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[10] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.

[12] Kagan Tumer,et al. Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[13] Yi Wu,et al. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[14] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[15] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[16] Mi-Ching Tsai,et al. Robust and Optimal Control , 2014 .

[17] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.

[18] Lenz Belzner,et al. Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation , 2018, AAMAS.

[19] Pushmeet Kohli,et al. Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis , 2018 .

[20] Oriol Vinyals,et al. Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[21] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[22] Jan Wieghardt,et al. Scenario co-evolution for reinforcement learning on a grid world smart factory domain , 2019, GECCO.

[23] Joel Z. Leibo,et al. Malthusian Reinforcement Learning , 2018, AAMAS.

[24] Lenz Belzner,et al. The scenario coevolution paradigm: adaptive quality assurance for adaptive systems , 2020, International Journal on Software Tools for Technology Transfer.

[25] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[26] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.

[27] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[28] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[29] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[30] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.

[31] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[32] Jon Edvardsson,et al. A Survey on Automatic Test Data Generation , 2002 .

[33] Andrew S. Tanenbaum,et al. Distributed systems: Principles and Paradigms , 2001 .

[34] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[35] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[36] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[37] Pushmeet Kohli,et al. Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures , 2018, ICLR.

[38] Balaraman Ravindran,et al. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[39] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[40] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[41] Leslie Pack Kaelbling,et al. All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[42] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[43] Kenneth O. Stanley,et al. POET: open-ended coevolution of environments and their optimized solutions , 2019, GECCO.

[44] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[45] Pushmeet Kohli,et al. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.