FalsifAI: Falsification of AI-Enabled Hybrid Control Systems Guided by Time-Aware Coverage Criteria

Modern Cyber-Physical Systems (CPSs) that need to perform complex control tasks (e.g., autonomous driving) are increasingly using AI-enabled controllers, mainly based on deep neural networks (DNNs). The quality assurance of such types of systems is of vital importance. However, their verification can be extremely challenging, due to their complexity and uninterpretable decision logic. Falsification is an established approach for CPS quality assurance, which, instead of attempting to prove the system correctness, aims at finding a time-variant input signal violating a formal specification describing the desired behavior; it often employs a search-based testing approach that tries to minimize the <italic>robustness</italic> of the specification, given by its quantitative semantics. However, guidance provided by robustness is mostly black-box and only related to the system output, but does not allow to understand whether the temporal internal behavior determined by multiple consecutive executions of the neural network controller has been explored sufficiently. To bridge this gap, in this paper, we make an early attempt at exploring the temporal behavior determined by the repeated executions of the neural network controllers in hybrid control systems and first propose eight time-aware coverage criteria specifically designed for neural network controllers in the context of CPS, which consider different features by design: the simple temporal activation of a neuron, the continuous activation of a neuron for a given duration, and the differential neuron activation behavior over time. Second, we introduce a falsification framework, named <inline-formula><tex-math notation="LaTeX">$\mathtt {FalsifAI}$</tex-math><alternatives><mml:math><mml:mi mathvariant="monospace">FalsifAI</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq1-3194640.gif"/></alternatives></inline-formula>, that exploits the coverage information for better falsification guidance. Namely, inputs of the controller that increase the coverage (so improving the <italic>exploration</italic> of the DNN behaviors), are prioritized in the <italic>exploitation</italic> phase of robustness minimization. Our large-scale evaluation over a total of 3 typical CPS tasks, 6 system specifications, 18 DNN models and more than 12,000 experiment runs, demonstrates 1) the advantage of our proposed technique in outperforming two state-of-the-art falsification approaches, and 2) the usefulness of our proposed time-aware coverage criteria for effective falsification guidance.

[1]  Paolo Arcaini,et al.  Hybrid System Falsification Under (In)equality Constraints via Search Space Transformation , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Alexandre Donzé,et al.  ARCH-COMP 2020 Category Report: Falsification , 2020, ARCH.

[3]  Lei Ma,et al.  Cats Are Not Fish: Deep Learning Testing Calls for Out-Of-Distribution Awareness , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Lei Ma,et al.  Marble: Model-based Robustness Analysis of Stateful Deep Learning Systems , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Michael D. Ernst,et al.  Revisiting the Relationship Between Fault Detection, Test Adequacy Criteria, and Test Set Size , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[6]  Weiming Xiang,et al.  NNV: The Neural Network Verification Tool for Deep Neural Networks and Learning-Enabled Cyber-Physical Systems , 2020, CAV.

[7]  Paolo Arcaini,et al.  Constraining Counterexamples in Hybrid System Falsification: Penalty-Based Approaches , 2020, NFM.

[8]  Christian Heinzemann,et al.  Experience Paper: Search-Based Testing in Automated Driving Control Applications , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Xenofon D. Koutsoukos,et al.  Safety Verification of Cyber-Physical Systems with Reinforcement Learning Control , 2019, ACM Trans. Embed. Comput. Syst..

[10]  Jianjun Zhao,et al.  DeepStellar: model-based quantitative analysis of stateful deep learning systems , 2019, ESEC/SIGSOFT FSE.

[11]  Sanjit A. Seshia,et al.  VerifAI: A Toolkit for the Formal Design and Analysis of Artificial Intelligence-Based Systems , 2019, CAV.

[12]  Lei Ma,et al.  DeepHunter: a coverage-guided fuzz testing framework for deep neural networks , 2019, ISSTA.

[13]  Jiameng Fan,et al.  ReachNN , 2019, ACM Trans. Embed. Comput. Syst..

[14]  Mark Harman,et al.  Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.

[15]  Taylor T. Johnson,et al.  ARCH-COMP19 Category Report: Artificial Intelligence and Neural Network Control Systems (AINNCS) for Continuous and Hybrid Systems Plants , 2019, ARCH@CPSIoTWeek.

[16]  Paolo Arcaini,et al.  Multi-Armed Bandits for Boolean Connectives in Hybrid System Falsification (Extended Version) , 2019, CAV.

[17]  Lionel C. Briand,et al.  Evaluating model testing and model checking for finding requirements violations in Simulink models , 2019, ESEC/SIGSOFT FSE.

[18]  Lei Ma,et al.  DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[19]  Georgios Fainekos,et al.  Gray-box adversarial testing for control systems with machine learning components , 2018, HSCC.

[20]  Mykel J. Kochenderfer,et al.  Deep Neural Network Compression for Aircraft Collision Avoidance Systems , 2018, Journal of Guidance, Control, and Dynamics.

[21]  Andrew Ruef,et al.  Evaluating Fuzz Testing , 2018, CCS.

[22]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[23]  Ian Goodfellow,et al.  TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing , 2018, ICML.

[24]  Jianye Hao,et al.  Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning , 2018, IEEE Transactions on Software Engineering.

[25]  Daniel Kroening,et al.  Concolic Testing for Deep Neural Networks , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[28]  Xiaoqing Jin,et al.  Classification and Coverage-Based Falsification for Embedded Control Systems , 2017, CAV.

[29]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[30]  Sanjit A. Seshia,et al.  Compositional Falsification of Cyber-Physical Systems with Machine Learning Components , 2017, Journal of Automated Reasoning.

[31]  Ken Butts,et al.  Simulation-Based Approaches for Verification of Embedded Control Systems: An Overview of Traditional and Advanced Modeling, Testing, and Verification Techniques , 2016, IEEE Control Systems.

[32]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[33]  Sriram Sankaranarayanan,et al.  Requirements driven falsification with coverage metrics , 2015, 2015 International Conference on Embedded Software (EMSOFT).

[34]  James Kapinski,et al.  Efficient Guiding Strategies for Testing of Temporal Properties of Hybrid Systems , 2015, NFM.

[35]  Kenneth R. Butts,et al.  Powertrain control verification benchmark , 2014, HSCC.

[36]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[37]  Marjan Mernik,et al.  Exploration and exploitation in evolutionary algorithms: A survey , 2013, CSUR.

[38]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[39]  Xichen Jiang,et al.  A Reachability-Based Method for Large-Signal Behavior Verification of DC-DC Converters , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[40]  Ichiro Hasuo,et al.  Programming with Infinitesimals: A While-Language for Hybrid System Modeling , 2011, ICALP.

[41]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[42]  Sriram Sankaranarayanan,et al.  S-TaLiRo: A Tool for Temporal Logic Falsification for Hybrid Systems , 2011, TACAS.

[43]  Oded Maler,et al.  Robust Satisfaction of Temporal Logic over Real-Valued Signals , 2010, FORMATS.

[44]  Alexandre Donzé,et al.  Breach, A Toolbox for Verification and Parameter Synthesis of Hybrid Systems , 2010, CAV.

[45]  George J. Pappas,et al.  Robustness of temporal logic specifications for continuous-time signals , 2009, Theor. Comput. Sci..

[46]  André Platzer,et al.  KeYmaera: A Hybrid Theorem Prover for Hybrid Systems (System Description) , 2008, IJCAR.

[47]  Michael D. Ernst,et al.  Randoop: feedback-directed random testing for Java , 2007, OOPSLA '07.

[48]  Thomas A. Henzinger,et al.  The theory of hybrid automata , 1996, Proceedings 11th Annual IEEE Symposium on Logic in Computer Science.

[49]  Steven P. Miller,et al.  Applicability of modified condition/decision coverage to software testing , 1994, Softw. Eng. J..

[50]  Rajeev Alur,et al.  A Theory of Timed Automata , 1994, Theor. Comput. Sci..

[51]  Gidon Ernst,et al.  ARCH-COMP 2021 Category Report: Falsification with Validation of Results , 2021, ARCH@ADHS.

[52]  Paolo Arcaini,et al.  Effective Hybrid System Falsification Using Monte Carlo Tree Search Guided by QB-Robustness , 2021, CAV.

[53]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[54]  Luan Viet Nguyen,et al.  Benchmark: DC-to-DC Switched-Mode Power Converters (Buck Converters, Boost Converters, and Buck-Boost Converters) , 2014, ARCH@CPSWeek.

[55]  Magnus C. Ohlsson,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.