Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

With the rapid development of software and distributed computing, Cyber-Physical Systems (CPS) are widely adopted in many application areas, e.g., smart grid, autonomous automobile. It is difficult to detect defects in CPS models due to the complexities involved in the software and physical systems. To find defects in CPS models efficiently, robustness guided falsification of CPS is introduced. Existing methods use several optimization techniques to generate counterexamples, which falsify the given properties of a CPS. However those methods may require a large number of simulation runs to find the counterexample and is far from practical. In this work, we explore state-of-the-art Deep Reinforcement Learning (DRL) techniques to reduce the number of simulation runs required to find such counterexamples. We report our method and the preliminary evaluation results.

[1]  Ron Koymans,et al.  Specifying real-time properties with metric temporal logic , 1990, Real-Time Systems.

[2]  Sriram Sankaranarayanan,et al.  Verification of automotive control applications using S-TaLiRo , 2012, 2012 American Control Conference (ACC).

[3]  Håkan L. S. Younes,et al.  Statistical probabilistic model checking with a focus on time-bounded properties , 2006, Inf. Comput..

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Calin Belta,et al.  A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks , 2018, 2018 Annual American Control Conference (ACC).

[6]  N. Cliff Ordinal methods for behavioral data analysis , 1996 .

[7]  Ezio Bartocci,et al.  System design of stochastic models using robustness of temporal properties , 2015, Theor. Comput. Sci..

[8]  Calin Belta,et al.  Robust Satisfaction of Temporal Logic Specifications via Reinforcement Learning , 2015, ArXiv.

[9]  Jianye Hao,et al.  Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning , 2020 .

[10]  James Kapinski,et al.  Efficient Guiding Strategies for Testing of Temporal Properties of Hybrid Systems , 2015, NFM.

[11]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12]  Alberto Camacho Decision-Making with Non-Markovian Rewards: From LTL to automata-based reward shaping , 2017 .

[13]  James Kapinski,et al.  Stochastic Local Search for Falsification of Hybrid Systems , 2015, ATVA.

[14]  Ezio Bartocci,et al.  On the Robustness of Temporal Properties for Stochastic Models , 2013, HSB.

[15]  Gregory W. Corder,et al.  Nonparametric Statistics : A Step-by-Step Approach , 2014 .

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  Lydia E. Kavraki,et al.  Falsification of LTL safety properties in hybrid systems , 2009, International Journal on Software Tools for Technology Transfer.

[18]  Yashwanth Singh Rahul Annapureddy,et al.  Ant colonies for Temporal Logic falsification of hybrid systems , 2010, IECON 2010 - 36th Annual Conference on IEEE Industrial Electronics Society.

[19]  Scott Sanner,et al.  Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping , 2021, SOCS.

[20]  Sriram Sankaranarayanan,et al.  Monte-carlo techniques for falsification of temporal properties of non-linear hybrid systems , 2010, HSCC '10.

[21]  Sriram Sankaranarayanan,et al.  Probabilistic Temporal Logic Falsification of Cyber-Physical Systems , 2013, TECS.

[22]  Radu Grosu,et al.  Monte Carlo Model Checking , 2005, TACAS.

[23]  Rupak Majumdar,et al.  Controller Synthesis for Reward Collecting Markov Processes in Continuous Space , 2017, HSCC.

[24]  Takumi Akazaki Falsification of Conditional Safety Properties for Cyber-Physical Systems with Gaussian Process Regression , 2016, RV.

[25]  Sriram Sankaranarayanan,et al.  S-TaLiRo: A Tool for Temporal Logic Falsification for Hybrid Systems , 2011, TACAS.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  S. Shankar Sastry,et al.  A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[28]  Rupak Majumdar,et al.  Testing Cyber-Physical Systems through Bayesian Optimization , 2017, ACM Trans. Embed. Comput. Syst..

[29]  Houssam Abbas,et al.  Using S-TaLiRo on Industrial Size AuImmlertomotive Models , 2015, ARCH@CPSWeek.

[30]  Lydia E. Kavraki,et al.  Asymptotically Optimal Stochastic Motion Planning with Temporal Goals , 2014, WAFR.

[31]  Houssam Abbas,et al.  Convergence proofs for Simulated Annealing falsification of safety properties , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Thomas A. Henzinger,et al.  Handbook of Model Checking , 2018, Springer International Publishing.

[33]  Alberto Policriti,et al.  An Active Learning Approach to the Falsification of Black Box Cyber-Physical Systems , 2017, IFM.

[34]  Houssam Abbas,et al.  Benchmarks for Temporal Logic Requirements for Automotive Systems , 2014, ARCH@CPSWeek.

[35]  Sriram Sankaranarayanan,et al.  Falsification of temporal properties of hybrid systems using the cross-entropy method , 2012, HSCC '12.

[36]  Parosh Aziz Abdulla,et al.  Tools and Algorithms for the Construction and Analysis of Systems - 17th International Conference, TACAS 2011, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2011, Saarbrücken, Germany, March 26-April 3, 2011. Proceedings , 2011, TACAS.

[37]  Calin Belta,et al.  Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints , 2014, IEEE Transactions on Automatic Control.

[38]  Georgios E. Fainekos,et al.  On-Line Monitoring for Temporal Logic Robustness , 2014, RV.

[39]  Calin Belta,et al.  MDP optimal control under temporal logic constraints , 2011, IEEE Conference on Decision and Control and European Control Conference.

[40]  Georgios Fainekos,et al.  Falsification of Temporal Logic Requirements Using Gradient Based Local Search in Space and Time , 2018, ADHS.

[41]  Edmund M. Clarke,et al.  Bayesian statistical model checking with application to Simulink/Stateflow verification , 2010, HSCC '10.

[42]  Joël Ouaknine,et al.  Online Monitoring of Metric Temporal Logic , 2014, RV.

[43]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[44]  Kenneth R. Butts,et al.  Powertrain control verification benchmark , 2014, HSCC.

[45]  Dejan Nickovic,et al.  Monitoring Temporal Properties of Continuous Signals , 2004, FORMATS/FTRTFT.

[46]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[47]  George J. Pappas,et al.  Robustness of Temporal Logic Specifications , 2006, FATES/RV.

[48]  Alexandre Donzé,et al.  Breach, A Toolbox for Verification and Parameter Synthesis of Hybrid Systems , 2010, CAV.

[49]  S. Sastry,et al.  Towards a Theory of Stochastic Hybrid Systems , 2000 .

[50]  Dejan Nickovic,et al.  Specification-Based Monitoring of Cyber-Physical Systems: A Survey on Theory, Tools and Applications , 2018, Lectures on Runtime Verification.

[51]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[52]  Calin Belta,et al.  Q-Learning for robust satisfaction of signal temporal logic specifications , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[53]  Oded Maler,et al.  Robust Satisfaction of Temporal Logic over Real-Valued Signals , 2010, FORMATS.

[54]  Calin Belta,et al.  Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[55]  Olaf Stursberg,et al.  On Systematic Simulation of Open Continuous Systems , 2003, HSCC.

[56]  Ludwig A. Hothorn,et al.  nparcomp: An R Software Package for Nonparametric Multiple Comparisons and Simultaneous Confidence Intervals , 2015 .