论文信息 - ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation - 字舞流文

ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation

In a multirobot system, a number of cyberphysical attacks (e.g., communication hijack, observation perturbations) can challenge the robustness of agents. This robustness issue worsens in multiagent reinforcement learning because there exists the non-stationarity of the environment caused by simultaneously learning agents whose changing policies affect the transition and reward functions. In this paper, we propose a minimax MARL approach to infer the worst-case policy update of other agents. As the minimax formulation is computationally intractable to solve, we apply the convex relaxation of neural networks to solve the inner minimization problem. Such convex relaxation enables robustness in interacting with peer agents that may have significantly different behaviors and also achieves a certified bound of the original optimization problem. We evaluate our approach on multiple mixed cooperative-competitive tasks and show that our method outperforms the previous state of the art approaches on this topic.

Jonathan P. How | Chuangchuang Sun | Dong-Ki Kim | J. How | Dong-Ki Kim | Chuangchuang Sun

[1] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.

[2] Yi Wu,et al. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[3] Cho-Jui Hsieh,et al. Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond , 2020, NeurIPS.

[4] Sehl Mellouli. A Reorganization Strategy to Build Fault-Tolerant Multi-Agent Systems , 2007, Canadian Conference on AI.

[5] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[6] Shimon Whiteson,et al. DiCE: The Infinitely Differentiable Monte-Carlo Estimator , 2018, ICML.

[7] Cho-Jui Hsieh,et al. Efficient Neural Network Robustness Certification with General Activation Functions , 2018, NeurIPS.

[8] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[9] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[10] Shimon Whiteson,et al. Stable Opponent Shaping in Differentiable Games , 2018, ICLR.

[11] Bruce Bueno de Mesquita,et al. An Introduction to Game Theory , 2014 .

[12] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[13] Haitham Bou-Ammar,et al. Balancing Two-Player Stochastic Games with Soft Q-Learning , 2018, IJCAI.

[14] Cho-Jui Hsieh,et al. A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks , 2019, NeurIPS.

[15] Cho-Jui Hsieh,et al. Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[16] Inderjit S. Dhillon,et al. Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[17] Jonathan P. How,et al. Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games , 2019, ArXiv.

[18] Olivier Pietquin,et al. Learning Nash Equilibrium for General-Sum Markov Games from Batch Data , 2016, AISTATS.

[19] Javier Alonso-Mora,et al. Multi-robot formation control and object transport in dynamic environments via constrained optimization , 2017, Int. J. Robotics Res..

[20] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[21] Sergey Levine,et al. Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.

[22] Matthew Mirman,et al. Fast and Effective Robustness Certification , 2018, NeurIPS.

[23] G. Tesauro,et al. Learning Hierarchical Teaching Policies for Cooperative Agents , 2019, AAMAS.

[24] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[25] Aditi Raghunathan,et al. Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[26] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[27] Mac Schwager,et al. Decentralized Adaptive Control for Collaborative Manipulation of Rigid Bodies , 2020, IEEE Transactions on Robotics.

[28] Catalin Buiu,et al. Integrating human swarm interaction in a distributed robotic control system , 2011, 2011 IEEE International Conference on Automation Science and Engineering.

[29] Victor R. Lesser,et al. Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[30] Junfeng Yang,et al. Efficient Formal Safety Analysis of Neural Networks , 2018, NeurIPS.

[31] Timon Gehr,et al. Boosting Robustness Certification of Neural Networks , 2018, ICLR.

[32] Timothy A. Mann,et al. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[33] Constantinos Daskalakis,et al. The complexity of constrained min-max optimization , 2020, STOC.

[34] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[35] Jun Wang,et al. Multi-Agent Reinforcement Learning , 2020, Deep Reinforcement Learning.

[36] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[37] Pushmeet Kohli,et al. Efficient Neural Network Verification with Exactness Characterization , 2019, UAI.

[38] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[39] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[40] A. Hardness,et al. Towards Fast Computation of Certified Robustness for ReLU Networks , 2018 .

[41] Jonathan P. How,et al. A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning , 2021, ICML.

[42] Jonathan P. How,et al. Certified Adversarial Robustness for Deep Reinforcement Learning , 2019, CoRL.

[43] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.