Dynamically Detecting Faults via Integrity Constraints

Control programs for safety-critical systems are required to tolerate faults in the devices they control. In this paper we examine a systematic approach to devising code to detect faulty devices at runtime. The approach is centred around the use of integrity constraints , which are invariants on the state of a system's variables, including its inputs and outputs. Under normal operation integrity constraints should always hold, but they are designed to fail to hold if there is a fault. By adding variables to capture the previous state of variables or the time of significant events, additional integrity constraints can be devised to check for faults in state transitions or faults with the rate of progress of the system. We discuss techniques for devising integrity constraints as well as efficiently evaluating the constraints. When an error is detected via the failure of an integrity constraint, the integrity constraint(s) that failed can help diagnose the likely fault. The techniques are presented by way of a simple case study of controller software written in the action system style, but the approach is equally applicable to other state machine approaches such as Event-B and TLA.

[1]  Didier Bert B’98: Recent Advances in the Development and Use of the B Method , 1998, Lecture Notes in Computer Science.

[2]  Colin J. Fidge,et al.  The deadline command , 1999, IEE Proc. Softw..

[3]  Kaisa Sere,et al.  Action Systems with Synchronous Communication , 1994, PROCOMET.

[4]  Torres Wilfredo,et al.  Software Fault Tolerance: A Tutorial , 2000 .

[5]  Jim Woodcock,et al.  Formal Methods and Hybrid Real-Time Systems, Essays in Honor of Dines Bjørner and Chaochen Zhou on the Occasion of Their 70th Birthdays, Papers presented at a Symposium held in Macao, China, September 24-25, 2007 , 2007, Formal Methods and Hybrid Real-Time Systems.

[6]  Paul Caspi,et al.  Threshold and Bounded-Delay Voting in Critical Control Systems , 2000, FTRTFT.

[7]  Jean-Raymond Abrial,et al.  Introducing Dynamic Constraints in B , 1998, B.

[8]  Egon Börger,et al.  Formal Methods for Industrial Applications , 1996, Lecture Notes in Computer Science.

[9]  Stefania Gnesi,et al.  FME 2003: Formal Methods: International Symposium of Formal Methods Europe, Pisa, Italy, September 8-14, 2003. Proceedings , 2003, Lecture Notes in Computer Science.

[10]  Neil R. Storey,et al.  Safety-critical computer systems , 1996 .

[11]  Ian J. Hayes,et al.  A sequential real-time refinement calculus , 2001, Acta Informatica.

[12]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[13]  Max Breitling,et al.  Modeling Faults of Distributed, Reactive Systems , 2000, FTRTFT.

[14]  Nancy G. Leveson,et al.  Safeware: System Safety and Computers , 1995 .

[15]  Leslie Lamport,et al.  Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers [Book Review] , 2002, Computer.

[16]  Michael A. Jackson,et al.  Problem Frames - Analysing and Structuring Software Development Problems , 2000 .

[17]  Cliff B. Jones,et al.  Determining the Specification of a Control System from That of Its Environment , 2003, FME.

[18]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[19]  Kaisa Sere,et al.  An Action System Approach to the Steam Boiler Problem , 1995, Formal Methods for Industrial Applications.

[20]  Robert Hanmer,et al.  Patterns for Fault Tolerant Software , 2007 .

[21]  Brian Randell,et al.  On Failures and Faults , 2003, FME.

[22]  Joost-Pieter Katoen,et al.  A probabilistic extension of UML statecharts: Specification and Verification. , 2002 .

[23]  Cliff B. Jones,et al.  Deriving Specifications for Systems That Are Connected to the Physical World , 2007, Formal Methods and Hybrid Real-Time Systems.