Verifiably Safe Off-Model Reinforcement Learning

The desire to use reinforcement learning in safety-critical settings has inspired a recent interest in formal methods for learning algorithms. Existing formal methods for learning and optimization primarily consider the problem of constrained learning or constrained optimization. Given a single correct model and associated safety constraint, these approaches guarantee efficient learning while provably avoiding behaviors outside the safety constraint. Acting well given an accurate environmental model is an important pre-requisite for safe learning, but is ultimately insufficient for systems that operate in complex heterogeneous environments. This paper introduces verification-preserving model updates, the first approach toward obtaining formal safety guarantees for reinforcement learning in settings where multiple environmental models must be taken into account. Through a combination of design-time model updates and runtime model falsification, we provide a first approach toward obtaining formal safety proofs for autonomous systems acting in heterogeneous environments.

[1]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[2]  A. Pnueli,et al.  Effective synthesis of switching controllers for linear systems , 2000, Proceedings of the IEEE.

[3]  André Platzer,et al.  ModelPlex: verified runtime validation of verified cyber-physical system models , 2014, Formal Methods in System Design.

[4]  Emanuel Kitzelmann,et al.  Inductive Programming: A Survey of Program Synthesis Techniques , 2009, AAIP.

[5]  Daniel Kroening,et al.  Logically-Correct Reinforcement Learning , 2018, ArXiv.

[6]  Jaime F. Fisac,et al.  Planning, Fast and Slow: A Framework for Adaptive Real-Time Safe Trajectory Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Sebastian Junges,et al.  Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.

[8]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[9]  Nidhi Kalra,et al.  Driving to Safety , 2016 .

[10]  Nathan Fulton,et al.  Bellerophon: Tactical Theorem Proving for Hybrid Systems , 2017, ITP.

[11]  André Platzer,et al.  A Complete Uniform Substitution Calculus for Differential Dynamic Logic , 2016, Journal of Automated Reasoning.

[12]  André Platzer,et al.  Differential Dynamic Logic for Hybrid Systems , 2008, Journal of Automated Reasoning.

[13]  Thomas A. Henzinger,et al.  Hybrid Automata: An Algorithmic Approach to the Specification and Verification of Hybrid Systems , 1992, Hybrid Systems.

[14]  André Platzer,et al.  Logics of Dynamical Systems , 2012, 2012 27th Annual IEEE Symposium on Logic in Computer Science.

[15]  Krishnendu Chatterjee,et al.  Verification of Markov Decision Processes Using Learning Algorithms , 2014, ATVA.

[16]  Calin Belta,et al.  Formal Guarantees in Data-Driven Model Identification and Control Synthesis , 2018, HSCC.

[17]  A. Platzer,et al.  ModelPlex: verified runtime validation of verified cyber-physical system models , 2016, Formal Methods Syst. Des..

[18]  Orna Grumberg,et al.  Sound and Complete Mutation-Based Program Repair , 2016, FM.

[19]  Nathan Fulton,et al.  KeYmaera X: An Axiomatic Tactical Theorem Prover for Hybrid Systems , 2015, CADE.

[20]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[21]  Edmund M. Clarke,et al.  Statistical Model Checking for Markov Decision Processes , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[22]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[23]  Sebastian Junges,et al.  Shielded Decision-Making in MDPs , 2018, ArXiv.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[26]  Jianye Hao,et al.  Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning , 2018, IEEE Transactions on Software Engineering.

[27]  Gireeja Ranade,et al.  Verifying Controllers Against Adversarial Examples with Bayesian Optimization , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Nathan Fulton,et al.  Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning , 2018, AAAI.

[29]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[30]  Nathan Fulton,et al.  Safe AI for CPS (Invited Paper) , 2018, 2018 IEEE International Test Conference (ITC).

[31]  Stefan Mitsch,et al.  Verifiably Safe Autonomy for Cyber-Physical Systems , 2018 .

[32]  Mo Chen,et al.  FaSTrack: A modular framework for fast and guaranteed safe motion planning , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[33]  André Platzer,et al.  VeriPhy: verified controller executables from verified cyber-physical system models , 2018, PLDI.