Rigorous development of a safety-critical system based on coordinated atomic actions

This paper describes our experience using coordinated atomic (CA) actions as a system structuring tool to design and validate a sophisticated control system for a complex industrial application that has high reliability and safety requirements. Our study is based on the "Fault-Tolerant Production Cell", which represents a manufacturing process involving redundant mechanical devices (provided in order to enable continued production in the presence of machine faults). The challenge posed by the model specification is to design a control system that maintains specified safety and liveness properties even in the presence of a large number and variety of device and sensor failures. We discuss in this paper: i) a design for a control program that uses CA actions to deal with both safety-related and fault tolerance concerns, and ii) the formal verification of this design based on the use of model-checking. We found that CA action structuring facilitated both the design and verification tasks by enabling the various safety problems (e.g. clashes of moving machinery) to be treated independently. The formal verification activity was performed in parallel with the design activity the interaction between them resulted in a combined exercise in "design for validation".

[1]  Emerson. , 1903 .

[2]  Brian Randell System structure for software fault tolerance , 1975 .

[3]  K. H. Kim,et al.  Approaches to Mechanization of the Conversation Scheme Based on Monitors , 1982, IEEE Transactions on Software Engineering.

[4]  Brian Randell,et al.  Error recovery in asynchronous systems , 1986, IEEE Transactions on Software Engineering.

[5]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[6]  E. Allen Emerson,et al.  Temporal and Modal Logic , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[7]  Kenneth L. McMillan,et al.  Symbolic model checking , 1992 .

[8]  Claus Lewerentz,et al.  Formal Development of Reactive Systems: Case Study Production Cell , 1995 .

[9]  Claus Lewerentz,et al.  Formal Development of Reactive Systems , 1995, Lecture Notes in Computer Science.

[10]  Santosh K. Shrivastava,et al.  The Design and Implementation of Arjuna , 1995, Comput. Syst..

[11]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[12]  B. Randell,et al.  Using Coordinated Atomic Actions to Design Complex Safety-critical Systems: the Production Cell Case Study , 1997 .

[13]  Jie Xu,et al.  Coordinated exception handling in distributed object systems: from model to system implementation , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[14]  Brian Randell,et al.  Developing Control Software for Production Cell II: Failure Analysis and System Design Using CA Actions , 1998 .

[15]  Brian Randell,et al.  COALA - A Formal Language for Coordinated Atomic Actions , 1998 .

[16]  Brian Randell,et al.  Formalization of the CA Action Concept Based on Temporal Logic , 1998 .

[17]  Elizabeth L. White,et al.  Application of dynamic reconfiguration in the design of fault tolerant production systems , 1998, Proceedings. Fourth International Conference on Configurable Distributed Systems (Cat. No.98EX159).

[18]  Peter Liggesmeyer,et al.  Improving system reliability with automatic fault tree generation , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[19]  Santosh K. Shrivastava,et al.  Checked transactions in an asynchronous message passing environment , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).