End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

[1]  Li Wang,et al.  Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation , 2018, ArXiv.

[2]  Chris Gaskett,et al.  Reinforcement learning under circumstances beyond its control , 2003 .

[3]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[4]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[5]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[6]  Kim Peter Wabersich,et al.  Scalable synthesis of safety certificates from data with application to learning-based control , 2018, 2018 European Control Conference (ECC).

[7]  Li Wang,et al.  Safe Learning of Quadrotor Dynamics Using Barrier Certificates , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Fred W. Glover,et al.  Simulation optimization: a review, new developments, and applications , 2005, Proceedings of the Winter Simulation Conference, 2005..

[10]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[11]  Koushil Sreenath,et al.  Discrete Control Barrier Functions for Safety-Critical Control of Discrete Systems with Application to Bipedal Robot Navigation , 2017, Robotics: Science and Systems.

[12]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[13]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[14]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[15]  Chaozhe R. He,et al.  Data-based fuel-economy optimization of connected automated trucks in traffic , 2018, 2018 Annual American Control Conference (ACC).

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[18]  Claire J. Tomlin,et al.  Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[20]  Tianshu Chu,et al.  Safe Reinforcement Learning: Learning with Supervision Using a Constraint-Admissible Set , 2018, 2018 Annual American Control Conference (ACC).

[21]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[22]  Yisong Yue,et al.  Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes , 2018, AAAI.

[23]  Tommaso Mannucci,et al.  Safe Exploration Algorithms for Reinforcement Learning Controllers , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[25]  Paulo Tabuada,et al.  Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.

[26]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[27]  Duy Nguyen-Tuong,et al.  Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[28]  Pieter Abbeel,et al.  Parameterized maneuver learning for autonomous helicopter flight , 2010, 2010 IEEE International Conference on Robotics and Automation.

[29]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[30]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[31]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[32]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[33]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[34]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[35]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.