Safe-visor Architecture for Sandboxing (AI-based) Unverified Controllers in Stochastic Cyber-Physical Systems

High performance but unverified controllers, e.g., artificial intelligence-based (a.k.a. AI-based) controllers, are widely employed in cyber-physical systems (CPSs) to accomplish complex control missions. However, guaranteeing the safety and reliability of CPSs with this kind of controllers is currently very challenging, which is of vital importance in many real-life safety-critical applications. To cope with this difficulty, we propose in this work a Safe-visor architecture for sandboxing unverified controllers in CPSs operating in noisy environments (a.k.a. stochastic CPSs). The proposed architecture contains a history-based supervisor, which checks inputs from the unverified controller and makes a compromise between functionality and safety of the system, and a safety advisor that provides fallback when the unverified controller endangers the safety of the system. By employing this architecture, we provide formal probabilistic guarantees on preserving the safety specifications expressed by accepting languages of deterministic finite automata (DFA). Meanwhile, the unverified controllers can still be employed in the control loop even though they are not reliable. We demonstrate the effectiveness of our proposed results by applying them to two ∗Corresponding author Email addresses: bingzhuo.zhong@tum.de (Bingzhuo Zhong), alavaei@ethz.ch (Abolfazl Lavaei), cao.hongpeng@tum.de (Hongpeng Cao), majid.zamani@colorado.edu (Majid Zamani), mcaccamo@tum.de (Marco Caccamo) Preprint submitted to Elsevier February 11, 2021 ar X iv :2 10 2. 05 49 0v 1 [ ee ss .S Y ] 1 0 Fe b 20 21 (physical) case studies.

[1]  Xiaofeng Wang,et al.  RSimplex , 2018, ACM Trans. Cyber Phys. Syst..

[2]  Vijay Kumar,et al.  Automated composition of motion primitives for multi-robot systems from safe LTL specifications , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Marco Caccamo,et al.  Application and System-Level Software Fault Tolerance through Full System Restarts , 2017, 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems (ICCPS).

[4]  Sadegh Esmaeil Zadeh Soudjani,et al.  Compositional Construction of Infinite Abstractions for Networks of Stochastic Control Systems , 2018, Autom..

[5]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[6]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[7]  Lui Sha,et al.  Real-Time Reachability for Verified Simplex Design , 2014, 2014 IEEE Real-Time Systems Symposium.

[8]  Joost-Pieter Katoen,et al.  Quantitative automata-based controller synthesis for non-autonomous stochastic hybrid systems , 2013, HSCC '13.

[9]  Subin Huh,et al.  Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach , 2020, ArXiv.

[10]  Andrea Carron,et al.  Safe Learning for Distributed Systems with Bounded Uncertainties , 2017 .

[11]  Giuseppe De Giacomo,et al.  Linear Temporal Logic and Linear Dynamic Logic on Finite Traces , 2013, IJCAI.

[12]  Mykel J. Kochenderfer,et al.  Deep Neural Network Compression for Aircraft Collision Avoidance Systems , 2018, Journal of Guidance, Control, and Dynamics.

[13]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[14]  Lui Sha,et al.  Using Simplicity to Control Complexity , 2001, IEEE Softw..

[15]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[16]  Jaime F. Fisac,et al.  Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[17]  S. Haesaert,et al.  Similarity quantification for linear stochastic systems as a set-theoretic control problem , 2020, ArXiv.

[18]  Adam Barth,et al.  Browser security , 2009, Commun. ACM.

[19]  Majid Zamani,et al.  Sandboxing Controllers for Stochastic Cyber-Physical Systems , 2019, FORMATS.

[20]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[21]  Nick Hawes,et al.  Simultaneous Task Allocation and Planning Under Uncertainty , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Anna Philippou,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.

[23]  Marta Z. Kwiatkowska Safety Verification for Deep Neural Networks with Provable Guarantees (Invited Paper) , 2019, CONCUR.

[24]  Tommaso Mannucci,et al.  Reinforcement learning based algorithm with Safety Handling and Risk Perception , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[25]  Ufuk Topcu,et al.  Synthesis of Admissible Shields , 2016, Haifa Verification Conference.

[26]  Kim Peter Wabersich,et al.  Linear Model Predictive Safety Certification for Learning-Based Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[27]  Yixin Yin,et al.  Safety-Aware Reinforcement Learning Framework with an Actor-Critic-Barrier Structure , 2019, 2019 American Control Conference (ACC).

[28]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[29]  Christel Baier,et al.  Principles of model checking , 2008 .

[30]  Katherine Rose Driggs-Campbell,et al.  Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[31]  Yarin Gal,et al.  Generalizing from a few environments in safety-critical reinforcement learning , 2019, ArXiv.

[32]  Sadegh Soudjani,et al.  Formal Policy Synthesis for Continuous-Space Systems via Reinforcement Learning , 2020, IFM.

[33]  Sofie Haesaert,et al.  Robust Dynamic Programming for Temporal Logic Control of Stochastic Systems , 2018, IEEE Transactions on Automatic Control.

[34]  Sofie Haesaert,et al.  Verification of General Markov Decision Processes by Approximate Similarity Relations and Policy Refinement , 2016, QEST.

[35]  Lui Sha,et al.  The Simplex Reference Model: Limiting Fault-Propagation Due to Unreliable Components in Cyber-Physical System Architectures , 2007, 28th IEEE International Real-Time Systems Symposium (RTSS 2007).

[36]  Lui Sha,et al.  NetSimplex: Controller Fault Tolerance Architecture in Networked Control Systems , 2013, IEEE Transactions on Industrial Informatics.

[37]  Jyotirmoy V. Deshmukh,et al.  Learning Deep Neural Network Controllers for Dynamical Systems with Safety Guarantees: Invited Paper , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[38]  Alessandro Abate,et al.  Automated Verification and Synthesis of Stochastic Hybrid Systems: A Survey , 2021, ArXiv.

[39]  Quanyan Zhu,et al.  Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems , 2015, IEEE Control Systems.

[40]  George J. Pappas,et al.  Hierarchical control system design using approximate simulation , 2001 .

[41]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[42]  Majid Zamani,et al.  Compositional Abstraction-based Synthesis of General MDPs via Approximate Probabilistic Relations , 2019, Nonlinear Analysis: Hybrid Systems.

[43]  Xiaofeng Wang,et al.  L1Simplex: Fault-tolerant control of cyber-physical systems , 2013, 2013 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS).

[44]  Alessandra Vizzaccaro,et al.  Model order reduction methods for geometrically nonlinear structures: a review of nonlinear techniques , 2021, Nonlinear Dynamics.

[45]  Marco Caccamo,et al.  Preserving Physical Safety Under Cyber Attacks , 2019, IEEE Internet of Things Journal.

[46]  Fabio Somenzi,et al.  Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning , 2020, 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS).

[47]  Ufuk Topcu,et al.  Synthesis of Minimum-Cost Shields for Multi-agent Systems , 2019, 2019 American Control Conference (ACC).

[48]  Peter Benner,et al.  Model Order Reduction for Linear and Nonlinear Systems: A System-Theoretic Perspective , 2014, Archives of Computational Methods in Engineering.

[49]  Lukas Hewing,et al.  Learning-Based Model Predictive Control: Toward Safe Learning in Control , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[50]  Marco Caccamo,et al.  Sandboxing Controllers for Cyber-Physical Systems , 2011, 2011 IEEE/ACM Second International Conference on Cyber-Physical Systems.

[51]  V. Borkar Probability Theory: An Advanced Course , 1995 .

[52]  S. Levine,et al.  Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks , 2019, IEEE Robotics and Automation Letters.

[53]  Chao Wang,et al.  Shield Synthesis: Runtime Enforcement for Reactive Systems , 2015, TACAS.