论文信息 - Provably Correct Training of Neural Network Controllers Using Reachability Analysis

Provably Correct Training of Neural Network Controllers Using Reachability Analysis

In this paper, we consider the problem of training neural network (NN) controllers for cyber-physical systems (CPS) that are guaranteed to satisfy safety and liveness properties. Our approach is to combine model-based design methodologies for dynamical systems with data-driven approaches to achieve this target. Given a mathematical model of the dynamical system, we compute a finite-state abstract model that captures the closedloop behavior under all possible neural network controllers. Using this finite-state abstract model, our framework identifies the subset of NN weights that are guaranteed to satisfy the safety requirements. During training, we augment the learning algorithm with a NN weight projection operator that enforces the resulting NN to be provably safe. To account for the liveness properties, the proposed framework uses the finitestate abstract model to identify candidate NN weights that may satisfy the liveness properties. Using such candidate NN weights, the proposed framework biases the NN training to achieve the liveness specification. Achieving the guarantees above, can not be ensured without correctness guarantees on the NN architecture, which controls the NN’s expressiveness. Therefore, and as a corner step in the proposed framework is the ability to select provably correct NN architectures automatically.

Yasser Shoukry | Xiaowu Sun | Yasser Shoukry | Xiaowu Sun

[1] Alexander Domahidi,et al. FORCES NLP: an efficient implementation of interior-point methods for multistage nonlinear nonconvex programs , 2020, Int. J. Control.

[2] Soon-Jo Chung,et al. Robust Regression for Safe Exploration in Control , 2019, L4DC.

[3] Dimos V. Dimarogonas,et al. Learning Control Barrier Functions from Expert Demonstrations , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[4] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[5] Kim Peter Wabersich,et al. Scalable synthesis of safety certificates from data with application to learning-based control , 2018, 2018 European Control Conference (ECC).

[6] Soon-Jo Chung,et al. Neural Lander: Stable Drone Landing Control Using Learned Dynamics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7] Li Wang,et al. Safe Learning of Quadrotor Dynamics Using Barrier Certificates , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8] Mohammad Ghavamzadeh,et al. Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[9] Owain Evans,et al. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.

[10] Jaime F. Fisac,et al. Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[11] Murat Arcak,et al. TIRA: toolbox for interval reachability analysis , 2019, HSCC.

[12] Yasser Shoukry,et al. Formal verification of neural network controlled autonomous systems , 2018, HSCC.

[13] Yasser Shoukry,et al. ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers , 2020, ArXiv.

[14] Chris Gaskett,et al. Reinforcement learning under circumstances beyond its control , 2003 .

[15] Xiao Li,et al. Temporal Logic Guided Safe Reinforcement Learning Using Control Barrier Functions , 2019, ArXiv.

[16] Aaron D. Ames,et al. A Control Barrier Perspective on Episodic Learning via Projection-to-State Safety , 2020, IEEE Control Systems Letters.

[17] Jaime F. Fisac,et al. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[18] Osbert Bastani,et al. Safe Reinforcement Learning via Statistical Model Predictive Shielding , 2021, Robotics: Science and Systems.

[19] Ashish Tiwari,et al. Output Range Analysis for Deep Feedforward Neural Networks , 2018, NFM.

[20] Insup Lee,et al. Verisig: verifying safety properties of hybrid systems with neural network controllers , 2018, HSCC.

[21] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[23] Andreas Krause,et al. Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[24] Manfred Morari,et al. Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks , 2019, NeurIPS.

[25] Calin Belta,et al. Temporal Logic Control of Discrete-Time Piecewise Affine Systems , 2012, IEEE Transactions on Automatic Control.

[26] J. Burdick,et al. Safe Multi-Agent Interaction through Robust Control Barrier Functions with Learned Uncertainties , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[27] Kim P. Wabersich,et al. A predictive safety filter for learning-based control of constrained nonlinear dynamical systems , 2021, Autom..

[28] Alessandro Abate,et al. FOSSIL: a software tool for the formal synthesis of lyapunov functions and barrier certificates using neural networks , 2021, HSCC.

[29] Andreas Krause,et al. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[30] Rupak Majumdar,et al. Multi-Layered Abstraction-Based Controller Synthesis for Continuous-Time Systems , 2018, HSCC.

[31] Calin Belta,et al. Temporal logic control of discrete-time piecewise affine systems , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[32] Weiming Xiang,et al. Reachable Set Estimation and Verification for Neural Network Models of Nonlinear Dynamic Systems , 2018, Safe, Autonomous and Intelligent Vehicles.

[33] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.

[34] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[35] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[36] Yasser Shoukry,et al. Two-Level Lattice Neural Network Architectures for Control of Nonlinear Systems , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[37] Torsten Koller,et al. Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[38] Mykel J. Kochenderfer,et al. Algorithms for Verifying Deep Neural Networks , 2019, Found. Trends Optim..

[39] Kim Peter Wabersich,et al. Linear Model Predictive Safety Certification for Learning-Based Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[40] Manfred Morari,et al. Learning Lyapunov Functions for Hybrid Systems , 2020, 2021 55th Annual Conference on Information Sciences and Systems (CISS).

[41] Jingliang Duan,et al. Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization* , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[42] Calin Belta,et al. Adaptive Control Barrier Functions for Safety-Critical Systems , 2020, ArXiv.

[43] Razvan Pascanu,et al. On the number of response regions of deep feed forward networks with piece-wise linear activations , 2013, 1312.6098.

[44] Samuel Coogan,et al. Synthesis of Control Barrier Functions Using a Supervised Machine Learning Approach , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45] Jaime F. Fisac,et al. Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning , 2021, Robotics: Science and Systems.

[46] Ruzena Bajcsy,et al. Data-driven reachability analysis for human-in-the-loop systems , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[47] Ufuk Topcu,et al. Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks , 2020, ArXiv.