Learning Deep Neural Network Controllers for Dynamical Systems with Safety Guarantees: Invited Paper

There is recent interest in using deep neural networks (DNNs) for controlling autonomous cyber-physical systems (CPSs). One challenge with this approach is that many autonomous CPS applications are safety-critical, and is not clear if DNNs can proffer safe system behaviors. To address this problem, we present an approach to modify existing (deep) reinforcement learning algorithms to guide the training of those controllers so that the overall system is safe. We present a novel verification-in-the-loop training algorithm that uses the formalism of barrier certificates to synthesize DNN-controllers that are safe by design. We demonstrate a proof-of-concept evaluation of our technique on multiple CPS examples.

[1]  P. Olver Nonlinear Systems , 2013 .

[2]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[3]  Wolfram Burgard,et al.  A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation , 2016 .

[4]  Magnus Egerstedt,et al.  Constructive Barrier Certificates with Applications to Fixed-Wing Aircraft Collision Avoidance , 2018, 2018 IEEE Conference on Control Technology and Applications (CCTA).

[5]  Eric Eaton,et al.  Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret , 2015, ICML.

[6]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[7]  Paulo Tabuada,et al.  Robustness of Control Barrier Functions for Safety Critical Control , 2016, ADHS.

[8]  Edmund M. Clarke,et al.  dReal: An SMT Solver for Nonlinear Theories over the Reals , 2013, CADE.

[9]  A. Papachristodoulou,et al.  Analysis of Non-polynomial Systems using the Sum of Squares Decomposition , 2005 .

[10]  Armando Solar-Lezama,et al.  Delta-Decision Procedures for Exists-Forall Problems over the Reals , 2018, CAV.

[11]  Paulo Tabuada,et al.  Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.

[12]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[13]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[14]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[15]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[17]  Frank Allgöwer,et al.  CONSTRUCTIVE SAFETY USING CONTROL BARRIER FUNCTIONS , 2007 .

[18]  Petter Nilsson,et al.  Barrier Functions: Bridging the Gap between Planning from Specifications and Safety-Critical Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[19]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[20]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[21]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[22]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[23]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[24]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[27]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[28]  James Kapinski,et al.  INVITED: Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[29]  Daniel Kroening,et al.  Logically-Constrained Reinforcement Learning , 2018, 1801.08099.

[30]  Ufuk Topcu,et al.  Correct-by-synthesis reinforcement learning with temporal logic constraints , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Jyotirmoy V. Deshmukh,et al.  Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems , 2018 .

[33]  A. Papachristodoulou,et al.  A tutorial on sum of squares techniques for systems analysis , 2005, Proceedings of the 2005, American Control Conference, 2005..

[34]  Natarajan Shankar,et al.  EFSMT: A Logical Framework for Cyber-Physical Systems , 2013, ArXiv.

[35]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[36]  Pablo A. Parrilo,et al.  Introducing SOSTOOLS: a general purpose sum of squares programming solver , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[37]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[38]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[39]  S. Shankar Sastry,et al.  A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[40]  Marek Grzes,et al.  Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.

[41]  Sonia Chernova,et al.  Learning from Demonstration for Shaping through Inverse Reinforcement Learning , 2016, AAMAS.

[42]  Sven Schewe,et al.  Omega-Regular Objectives in Model-Free Reinforcement Learning , 2018, TACAS.