论文信息 - Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces

Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces

Multi-agent control problems constitute an interesting area of application for deep reinforcement learning models with continuous action spaces. Such real-world applications, however, typically come with critical safety constraints that must not be violated. In order to ensure safety, we enhance the well-known multi-agent deep deterministic policy gradient (MADDPG) framework by adding a safety layer to the deep policy network. In particular, we extend the idea of linearizing the single-step transition dynamics, as was done for single-agent systems in Safe DDPG (Dalal et al., 2018), to multi-agent settings. We additionally propose to circumvent infeasibility problems in the action correction step using soft constraints (Kerrigan & Maciejowski, 2000). Results from the theory of exact penalty functions can be used to guarantee constraint satisfaction of the soft constraints under mild assumptions. We empirically find that the soft formulation achieves a dramatic decrease in constraint violations, making safety available even during the learning procedure.

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.

[3] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[4] Donald Goldfarb,et al. A numerically stable dual method for solving strictly convex quadratic programs , 1983, Math. Program..

[5] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[6] Vijay Kumar,et al. Learning Safe Unlabeled Multi-Robot Planning with Motion Constraints , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] Kim Peter Wabersich,et al. Linear Model Predictive Safety Certification for Learning-Based Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[8] Barry Lennox,et al. Voronoi-Based Multi-Robot Autonomous Exploration in Unknown Environments via Deep Reinforcement Learning , 2020, IEEE Transactions on Vehicular Technology.

[9] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11] Mohammad Ghavamzadeh,et al. Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[12] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13] Osbert Bastani,et al. MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding , 2019, ArXiv.

[14] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[15] J. Maciejowski,et al. Soft constraints and exact penalty functions in model predictive control , 2000 .

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] Jianfeng Gao,et al. Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear , 2016, ArXiv.

[18] Eitan Altman,et al. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program , 1998, Math. Methods Oper. Res..

[19] Craig Boutilier,et al. Data center cooling using model-predictive control , 2018, NeurIPS.

[20] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.