MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding

Reinforcement learning is a promising approach to learning control policies for performing complex multi-agent robotics tasks. However, a policy learned in simulation often fails to guarantee even simple safety properties such as obstacle avoidance. To ensure safety, we propose multi-agent model predictive shielding (MAMPS), an algorithm that provably guarantees safety for an arbitrary learned policy. In particular, it operates by using the learned policy as often as possible, but instead uses a backup policy in cases where it cannot guarantee the safety of the learned policy. Using a multi-agent simulation environment, we show how MAMPS can achieve good performance while ensuring safety.

[1]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[2]  Osbert Bastani,et al.  Safe Reinforcement Learning via Online Shielding , 2019, ArXiv.

[3]  Vijay Kumar,et al.  Automated composition of motion primitives for multi-robot systems from safe LTL specifications , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[5]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[6]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[7]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[8]  Geoffrey A. Hollinger,et al.  HERB: a home exploring robotic butler , 2010, Auton. Robots.

[9]  Jaime F. Fisac,et al.  Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[10]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[11]  Vijay Kumar,et al.  Planning Dynamically Feasible Trajectories for Quadrotors Using Safe Flight Corridors in 3-D Complex Environments , 2017, IEEE Robotics and Automation Letters.

[12]  Javier Alonso-Mora,et al.  Parallel autonomy in automated vehicles: Safe motion generation with minimal intervention , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[14]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[15]  Insup Lee,et al.  Verisig: verifying safety properties of hybrid systems with neural network controllers , 2018, HSCC.

[16]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[17]  Vijay Kumar,et al.  Learning Safe Unlabeled Multi-Robot Planning with Motion Constraints , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Vijay Kumar,et al.  Modeling and control of formations of nonholonomic mobile robots , 2001, IEEE Trans. Robotics Autom..

[19]  Javier Alonso-Mora,et al.  Multi-robot navigation in formation via sequential convex programming , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Kim Peter Wabersich,et al.  Linear Model Predictive Safety Certification for Learning-Based Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[21]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[22]  George J. Pappas,et al.  Coordination of Multiple Autonomous Vehicles , 2003 .

[23]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[25]  Paul A. Beardsley,et al.  Optimal Reciprocal Collision Avoidance for Multiple Non-Holonomic Robots , 2010, DARS.

[26]  Claire J. Tomlin,et al.  Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor , 2012, 2012 IEEE International Conference on Robotics and Automation.