Causality and Temporal Dependencies in the Design of Fault Management Systems

Reasoning about causes and effects naturally arises in the engineering of safety-critical systems. A classical example is Fault Tree Analysis, a deductive technique used for system safety assessment, whereby an undesired state is reduced to the set of its immediate causes. The design of fault management systems also requires reasoning on causality relationships. In particular, a fail-operational system needs to ensure timely detection and identification of faults, i.e. recognize the occurrence of run-time faults through their observable effects on the system. Even more complex scenarios arise when multiple faults are involved and may interact in subtle ways. In this work, we propose a formal approach to fault management for complex systems. We first introduce the notions of fault tree and minimal cut sets. We then present a formal framework for the specification and analysis of diagnosability, and for the design of fault detection and identification (FDI) components. Finally, we review recent advances in fault propagation analysis, based on the Timed Failure Propagation Graphs (TFPG) formalism.

[1]  Marco Bozzano,et al.  Formal Design of Fault Detection and Identification Components Using Temporal Epistemic Logic , 2014, TACAS.

[2]  Sylvain Metge,et al.  Safety assessment with AltaRica - Lessons learnt based on two aircraft system studies , 2004, IFIP Congress Topical Sessions.

[3]  Marco Roveri,et al.  Validation of requirements for hybrid systems: A formal approach , 2012, TSEM.

[4]  Steven P. Miller,et al.  A proposal for model-based safety analysis , 2005, 24th Digital Avionics Systems Conference.

[5]  Marco Bozzano,et al.  Symbolic Fault Tree Analysis for Reactive Systems , 2007, ATVA.

[6]  Marco Bozzano,et al.  Symbolic Synthesis of Observability Requirements for Diagnosability , 2012, AAAI.

[7]  Marco Bozzano,et al.  The xSAP Safety Analysis Platform , 2016, TACAS.

[8]  Joost-Pieter Katoen,et al.  Safety, Dependability and Performance Analysis of Extended AADL Models , 2011, Comput. J..

[9]  Marco Bozzano,et al.  Design and Safety Assessment of Critical Systems , 2010 .

[10]  Marco Bozzano,et al.  Automated Synthesis of Timed Failure Propagation Graphs , 2016, IJCAI.

[11]  Joseph Y. Halpern,et al.  The complexity of reasoning about knowledge and time , 1986, STOC '86.

[12]  Mohammad Reza Mousavi,et al.  (De-)Composing Causality in Labeled Transition Systems , 2016, CREST@ETAPS.

[13]  Florian Leitner-Fischer,et al.  Probabilistic fault tree synthesis using causality computation , 2013, Int. J. Crit. Comput. Based Syst..

[14]  Marco Bozzano,et al.  FAME Process: A Dedicated Development and V&V Process for FDIR , 2014 .

[15]  Marco Bozzano,et al.  ESACS: an integrated methodology for design and safety analysis of complex systems , 2003 .

[16]  Alessio Lomuscio,et al.  Verifying Fault Tolerance and Self-Diagnosability of an Autonomous Underwater Vehicle , 2011, IJCAI.

[17]  Christel Baier,et al.  Principles of model checking , 2008 .

[18]  Marco Pistore,et al.  NuSMV 2: An OpenSource Tool for Symbolic Model Checking , 2002, CAV.

[19]  Alessandro Cimatti,et al.  Formal verification of diagnosability via symbolic model checking , 2003, IJCAI 2003.

[20]  Stéphane Lafortune,et al.  Failure diagnosis using discrete event models , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[21]  Edmund M. Clarke,et al.  Model Checking , 1999, Handbook of Automated Reasoning.

[22]  Marco Bozzano,et al.  Improving Safety Assessment of Complex Systems: An Industrial Case Study , 2003, FME.

[23]  Yannick Pencolé,et al.  Diagnosis of discrete-event systems using binary decision diagrams , 2004 .

[24]  Joseph Y. Halpern,et al.  Causes and Explanations: A Structural-Model Approach. Part I: Causes , 2000, The British Journal for the Philosophy of Science.

[25]  Alberto Griggio,et al.  Parameter synthesis with IC3 , 2013, 2013 Formal Methods in Computer-Aided Design.

[26]  Marco Bozzano,et al.  Automated Verification and Tightening of Failure Propagation Models , 2016, AAAI.

[27]  Shengbing Jiang,et al.  Failure diagnosis of discrete event systems with linear-time temporal logic fault specifications , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[28]  Sherif Abdelwahed,et al.  Practical Implementation of Diagnosis Systems Using Timed Failure Propagation Graph Models , 2009, IEEE Transactions on Instrumentation and Measurement.

[29]  Parosh Aziz Abdulla,et al.  Designing Safe, Reliable Systems Using Scade , 2004, ISoLA.

[30]  Marco Bozzano,et al.  An Integrated Process for FDIR Design in Aerospace , 2014, IMBSA.

[31]  M. Bozzano,et al.  Integrating Fault Tree Analysis with Event Ordering Information ∗ , 2003 .

[32]  Xiaowei Huang,et al.  Diagnosability in concurrent probabilistic systems , 2013, AAMAS.

[33]  Marco Bozzano,et al.  Formal Design of Asynchronous Fault Detection and Identification Components using Temporal Epistemic Logic , 2015, Log. Methods Comput. Sci..

[34]  Benjamin Bittner,et al.  Formal failure analyses for effective fault management: an aerospace perspective , 2016 .

[35]  Joseph Y. Halpern,et al.  The Complexity of Reasoning about Knowledge and Time. I. Lower Bounds , 1989, J. Comput. Syst. Sci..

[36]  Marco Bozzano,et al.  Efficient Anytime Techniques for Model-Based Safety Analysis , 2015, CAV.

[37]  Aaron R. Bradley,et al.  SAT-Based Model Checking without Unrolling , 2011, VMCAI.

[38]  Marco Bozzano,et al.  SMT-Based Validation of Timed Failure Propagation Graphs , 2015, AAAI.

[39]  Joseph Y. Halpern A Modification of the Halpern-Pearl Definition of Causality , 2015, IJCAI.

[40]  Ron van der Meyden,et al.  MCK: Model Checking the Logic of Knowledge , 2004, CAV.