论文信息 - ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

Author(s): Ferlez, James; Elnaggar, Mahmoud; Shoukry, Yasser; Fleming, Cody | Abstract: In this paper, we consider the problem of creating a safe-by-design Rectified Linear Unit (ReLU) Neural Network (NN), which, when composed with an arbitrary control NN, makes the composition provably safe. In particular, we propose an algorithm to synthesize such NN filters that safely correct control inputs generated for the continuous-time Kinematic Bicycle Model (KBM). ShieldNN contains two main novel contributions: first, it is based on a novel Barrier Function (BF) for the KBM model; and second, it is itself a provably sound algorithm that leverages this BF to a design a safety filter NN with safety guarantees. Moreover, since the KBM is known to well approximate the dynamics of four-wheeled vehicles, we show the efficacy of ShieldNN filters in CARLA simulations of four-wheeled vehicles. In particular, we examined the effect of ShieldNN filters on Deep Reinforcement Learning trained controllers in the presence of individual pedestrian obstacles. The safety properties of ShieldNN were borne out in our experiments: the ShieldNN filter reduced the number of obstacle collisions by 99.4%-100%. Furthermore, we also studied the effect of incorporating ShieldNN during training: for a constant number of episodes, 28% less reward was observed when ShieldNN wasn't used during training. This suggests that ShieldNN has the further property of improving sample efficiency during RL training.

[1] Paulo Tabuada,et al. Control Barrier Functions: Theory and Applications , 2019, 2019 18th European Control Conference (ECC).

[2] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[3] Jingliang Duan,et al. Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization* , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[4] Paulo Tabuada,et al. Robustness of Control Barrier Functions for Safety Critical Control , 2016, ADHS.

[5] Yasser Shoukry,et al. Formal verification of neural network controlled autonomous systems , 2018, HSCC.

[6] Dimos V. Dimarogonas,et al. Learning Control Barrier Functions from Expert Demonstrations , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[7] Jaime F. Fisac,et al. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[8] David Filliat,et al. Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics , 2018, ArXiv.

[9] Weiming Xiang,et al. Reachable Set Estimation and Verification for Neural Network Models of Nonlinear Dynamic Systems , 2018, Safe, Autonomous and Intelligent Vehicles.

[10] Paulo Tabuada,et al. Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.

[11] Ruzena Bajcsy,et al. Data-driven reachability analysis for human-in-the-loop systems , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[12] Insup Lee,et al. Verisig: verifying safety properties of hybrid systems with neural network controllers , 2018, HSCC.

[13] Li Wang,et al. Safe Learning of Quadrotor Dynamics Using Barrier Certificates , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.

[15] Manfred Morari,et al. Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks , 2019, NeurIPS.

[16] Joel W. Burdick,et al. Safe Multi-Agent Interaction through Robust Control Barrier Functions with Learned Uncertainties , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[17] Andreas Krause,et al. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[18] Mohammad Ghavamzadeh,et al. Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[19] Chris Gaskett,et al. Reinforcement learning under circumstances beyond its control , 2003 .

[20] Frank Allgöwer,et al. Training robust neural networks using Lipschitz bounds , 2020, ArXiv.

[21] Samuel Coogan,et al. Synthesis of Control Barrier Functions Using a Supervised Machine Learning Approach , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22] Soon-Jo Chung,et al. Neural Lander: Stable Drone Landing Control Using Learned Dynamics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[23] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[24] Xiao Li,et al. Temporal Logic Guided Safe Reinforcement Learning Using Control Barrier Functions , 2019, ArXiv.

[25] Germán Ros,et al. CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[26] Francesco Borrelli,et al. Kinematic and dynamic vehicle models for autonomous driving control design , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[27] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[28] Mykel J. Kochenderfer,et al. Algorithms for Verifying Deep Neural Networks , 2019, Found. Trends Optim..

[29] Owain Evans,et al. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.

[30] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[31] Kim Peter Wabersich,et al. Scalable synthesis of safety certificates from data with application to learning-based control , 2018, 2018 European Control Conference (ECC).

[32] Torsten Koller,et al. Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[33] Andreas Krause,et al. Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[34] Aaron D. Ames,et al. A Control Barrier Perspective on Episodic Learning via Projection-to-State Safety , 2021, IEEE Control Systems Letters.

[35] Soon-Jo Chung,et al. Robust Regression for Safe Exploration in Control , 2019, L4DC.

[36] Jaime F. Fisac,et al. Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[37] Ashish Tiwari,et al. Output Range Analysis for Deep Feedforward Neural Networks , 2018, NFM.