An inductive synthesis framework for verifiable reinforcement learning

Despite the tremendous advances that have been made in the last decade on developing useful machine-learning applications, their wider adoption has been hindered by the lack of strong assurance guarantees that can be made about their behavior. In this paper, we consider how formal verification techniques developed for traditional software systems can be repurposed for verification of reinforcement learning-enabled ones, a particularly important class of machine learning systems. Rather than enforcing safety by examining and altering the structure of a complex neural network implementation, our technique uses blackbox methods to synthesizes deterministic programs, simpler, more interpretable, approximations of the network that can nonetheless guarantee desired safety properties are preserved, even when the network is deployed in unanticipated or previously unobserved environments. Our methodology frames the problem of neural network verification in terms of a counterexample and syntax-guided inductive synthesis procedure over these programs. The synthesis procedure searches for both a deterministic program and an inductive invariant over an infinite state transition system that represents a specification of an application's control logic. Additional specifications defining environment-based constraints can also be provided to further refine the search space. Synthesized programs deployed in conjunction with a neural network implementation dynamically enforce safety conditions by monitoring and preventing potentially unsafe actions proposed by neural policies. Experimental results over a wide range of cyber-physical applications demonstrate that software-inspired formal verification techniques can be used to realize trustworthy reinforcement learning systems with low overhead.

[1]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[2]  Richard M. Murray,et al.  Control design for hybrid systems with TuLiP: The Temporal Logic Planning toolbox , 2016, 2016 IEEE Conference on Control Applications (CCA).

[3]  Anna Philippou,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.

[4]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[5]  Mahesh Viswanathan,et al.  Controller Synthesis Made Real: Reach-Avoid Specifications and Linear Dynamics , 2018, CAV.

[6]  Antonio Criminisi,et al.  Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[7]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[8]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[9]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[10]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[11]  Chaouki T. Abdallah,et al.  Linear Quadratic Control: An Introduction , 2000 .

[12]  Hui Kong,et al.  Exponential-Condition-Based Barrier Certificate Generation for Safety Verification of Hybrid Systems , 2013, CAV.

[13]  Matthias Althoff,et al.  Optimal control of sets of solutions to formally guarantee constraints of disturbed linear systems , 2017, 2017 American Control Conference (ACC).

[14]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[15]  D. Jordan,et al.  Nonlinear Ordinary Differential Equations: An Introduction for Scientists and Engineers , 1979 .

[16]  R N Bergman,et al.  Assessment of insulin sensitivity in vivo. , 1985, Endocrine reviews.

[17]  Armando Solar-Lezama,et al.  The Sketching Approach to Program Synthesis , 2009, APLAS.

[18]  Andreas Krause,et al.  Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[19]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[20]  Ufuk Topcu,et al.  Control-Oriented Learning of Lagrangian and Hamiltonian Systems , 2018, 2018 Annual American Control Conference (ACC).

[21]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[22]  Abhinav Verma,et al.  Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[23]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[24]  Xiaowei Huang,et al.  Reachability Analysis of Deep Neural Networks with Provable Guarantees , 2018, IJCAI.

[25]  Ali Jadbabaie,et al.  Safety Verification of Hybrid Systems Using Barrier Certificates , 2004, HSCC.

[26]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[27]  Armando Solar-Lezama,et al.  Program synthesis by sketching , 2008 .

[28]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[29]  Russ Tedrake,et al.  LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[30]  Daniel Kroening,et al.  Concolic Testing for Deep Neural Networks , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[31]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[32]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[33]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[34]  Sumit Gulwani,et al.  Program analysis as constraint solving , 2008, PLDI '08.

[35]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[36]  Mo Chen,et al.  Hamilton-Jacobi reachability: A brief overview and recent advances , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[37]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[38]  Claire J. Tomlin,et al.  Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor , 2012, 2012 IEEE International Conference on Robotics and Automation.

[39]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[40]  Robin Deits,et al.  Sum-of-squares optimization in Julia , 2017 .

[41]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[42]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[43]  Matthew Wicker,et al.  Feature-Guided Black-Box Safety Testing of Deep Neural Networks , 2017, TACAS.

[44]  Sumit Gulwani,et al.  Constraint-Based Approach for Analysis of Hybrid Systems , 2008, CAV.

[45]  Chao Wang,et al.  Shield Synthesis: Runtime Enforcement for Reactive Systems , 2015, TACAS.

[46]  D. Jordan,et al.  Nonlinear ordinary differential equations (2nd ed.) , 1987 .