Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning

The transfer of reinforcement learning (RL) techniques into real-world applications is challenged by safety requirements in the presence of physical limitations. Most RL methods, in particular the most popular algorithms, do not support explicit consideration of state and input constraints. In this paper, we address this problem for nonlinear systems with continuous state and input spaces by introducing a predictive safety filter, which is able to turn a constrained dynamical system into an unconstrained safe system, to which any RL algorithm can be applied `out-of-the-box'. The predictive safety filter receives the proposed learning input and decides, based on the current system state, if it can be safely applied to the real system, or if it has to be modified otherwise. Safety is thereby established by a continuously updated safety policy, which is based on a model predictive control formulation using a data-driven system model and considering state and input dependent uncertainties.

[1]  Marc Toussaint,et al.  Automatic testing and minimax optimization of system parameters for best worst-case performance , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Alexander Liniger,et al.  Cautious NMPC with Gaussian Process Dynamics for Miniature Race Cars , 2017, ArXiv.

[3]  Frank Allgöwer,et al.  Adaptive Model Predictive Control with Robust Constraint Satisfaction , 2017 .

[4]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[5]  Andreas Krause,et al.  Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[6]  Andrea Carron,et al.  Safe Learning for Distributed Systems with Bounded Uncertainties , 2017 .

[7]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[8]  Frank Allgöwer,et al.  Learning-Based Robust Model Predictive Control with State-Dependent Uncertainty , 2018 .

[9]  Jan M. Maciejowski,et al.  Learning-based Nonlinear Model Predictive Control , 2017 .

[10]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[11]  Moritz Diehl,et al.  CasADi: a software framework for nonlinear optimization and optimal control , 2018, Mathematical Programming Computation.

[12]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..

[13]  Lorenzo Fagiano,et al.  Adaptive model predictive control for constrained linear systems , 2013, 2013 European Control Conference (ECC).

[14]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[15]  Gireeja Ranade,et al.  Verifying Controllers Against Adversarial Examples with Bayesian Optimization , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Claire J. Tomlin,et al.  Guaranteed safe online learning of a bounded system , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Claire J. Tomlin,et al.  Learning-based model predictive control on a quadrotor: Onboard implementation and experimental results , 2012, 2012 IEEE International Conference on Robotics and Automation.

[18]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[19]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[20]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[21]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[22]  Angela P. Schoellig,et al.  Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking , 2016, Int. J. Robotics Res..

[23]  Kim Peter Wabersich,et al.  Scalable synthesis of safety certificates from data with application to learning-based control , 2018, 2018 European Control Conference (ECC).

[24]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[25]  Hong Chen,et al.  Nonlinear Model Predictive Control Schemes with Guaranteed Stability , 1998 .

[26]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[27]  Frank Allgöwer,et al.  A novel constraint tightening approach for nonlinear robust model predictive control , 2018, 2018 Annual American Control Conference (ACC).

[28]  Julien Marzat,et al.  A new expected-improvement algorithm for continuous minimax optimization , 2016, J. Glob. Optim..

[29]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[30]  David Q. Mayne,et al.  Tube‐based robust nonlinear model predictive control , 2011 .

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[33]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[34]  Frank Allgöwer,et al.  Nonlinear Reference Tracking: An Economic Model Predictive Control Perspective , 2019, IEEE Transactions on Automatic Control.

[35]  F. Allgöwer,et al.  Tube MPC scheme based on robust control invariant set with application to Lipschitz nonlinear systems , 2011, IEEE Conference on Decision and Control and European Control Conference.

[36]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[37]  Russ Tedrake,et al.  Funnel libraries for real-time robust feedback motion planning , 2016, Int. J. Robotics Res..

[38]  Frank Allgöwer,et al.  A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability , 1997, 1997 European Control Conference (ECC).

[39]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[40]  Ian R. Manchester,et al.  LQR-trees: Feedback Motion Planning via Sums-of-Squares Verification , 2010, Int. J. Robotics Res..

[41]  Kim Peter Wabersich,et al.  Linear Model Predictive Safety Certification for Learning-Based Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[42]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[43]  Frank Allgöwer,et al.  CONSTRUCTIVE SAFETY USING CONTROL BARRIER FUNCTIONS , 2007 .