论文信息 - Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning

Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning

Formal verification provides a high degree of confidence in safe system operation, but only if reality matches the verified model. Although a good model will be accurate most of the time, even the best models are incomplete. This is especially true in Cyber-Physical Systems because high-fidelity physical models of systems are expensive to develop and often intractable to verify. Conversely, reinforcement learning-based controllers are lauded for their flexibility in unmodeled environments, but do not provide guarantees of safe operation. This paper presents an approach for provably safe learning that provides the best of both worlds: the exploration and optimization capabilities of learning along with the safety guarantees of formal verification. Our main insight is that formal verification combined with verified runtime monitoring can ensure the safety of a learning agent. Verification results are preserved whenever learning agents limit exploration within the confounds of verified control choices as long as observed reality comports with the model used for off-line verification. When a model violation is detected, the agent abandons efficiency and instead attempts to learn a control strategy that guides the agent to a modeled portion of the state space. We prove that our approach toward incorporating knowledge about safe control into learning systems preserves safety guarantees, and demonstrate that we retain the empirical performance benefits provided by reinforcement learning. We also explore various points in the design space for these justified speculative controllers in a simple model of adaptive cruise control model for autonomous cars.

Nathan Fulton | André Platzer | Nathan Fulton | André Platzer

[1] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[2] Nathan Fulton,et al. KeYmaera X: An Axiomatic Tactical Theorem Prover for Hybrid Systems , 2015, CADE.

[3] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[4] Hoyt Lougee,et al. SOFTWARE CONSIDERATIONS IN AIRBORNE SYSTEMS AND EQUIPMENT CERTIFICATION , 2001 .

[5] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.

[6] André Platzer,et al. A Complete Uniform Substitution Calculus for Differential Dynamic Logic , 2016, Journal of Automated Reasoning.

[7] Peter Geibel,et al. Reinforcement Learning for MDPs with Constraints , 2006, ECML.

[8] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[9] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[10] André Platzer,et al. ModelPlex: verified runtime validation of verified cyber-physical system models , 2014, Formal Methods in System Design.

[11] Shie Mannor,et al. Scaling Up Robust MDPs by Reinforcement Learning , 2013, ArXiv.

[12] André Platzer,et al. Differential Dynamic Logic for Hybrid Systems , 2008, Journal of Automated Reasoning.

[13] André Platzer,et al. Adaptive Cruise Control: Hybrid, Distributed, and Now Formally Verified , 2011, FM.

[14] Masami Yasuda,et al. Discounted Markov decision processes with utility constraints , 2006, Comput. Math. Appl..

[15] André Platzer,et al. The Complete Proof Theory of Hybrid Systems , 2012, 2012 27th Annual IEEE Symposium on Logic in Computer Science.

[16] Saso Dzeroski,et al. Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[17] Thomas A. Henzinger,et al. Hybrid Automata: An Algorithmic Approach to the Specification and Verification of Hybrid Systems , 1992, Hybrid Systems.

[18] Nidhi Kalra,et al. Driving to Safety , 2016 .

[19] Matthias Heger,et al. Consideration of risk in reinformance learning , 1994, ICML 1994.

[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22] André Platzer,et al. Logics of Dynamical Systems , 2012, 2012 27th Annual IEEE Symposium on Logic in Computer Science.