Developing Control Software for Production Cell II: Failure Analysis and System Design Using CA Actions

This paper describes our experience using coordinated atomic (CA) actions as a system structuring tool to design a sophisticated control system for a complex industrial application that has high reliability and safety requirements. Our study is based on an extended production cell model, the specification and simulator for which were defined and developed by FZI (Forschungszentrum Informatik, Germany). This “Fault-Tolerant Production Cell” represents a manufacturing process involving redundant mechanical devices (provided in order to enable continued production in the presence of machine faults). The challenge posed by the model specification is to design a control system that maintains specified safety and liveness properties even in the presence of a large number and variety of device and sensor failures. In this paper we provide an analysis of possible component failures, describe a design for a control program that uses CA actions to deal with both safety-related and fault tolerance concerns, and outline an implementation of the control program.

[1]  Brian Randell,et al.  Error recovery in asynchronous systems , 1986, IEEE Transactions on Software Engineering.

[2]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[3]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  Manuel Fähndrich,et al.  Extensions to Standard ML to Support Transactions , 1992 .

[5]  Peter Liggesmeyer,et al.  Improving system reliability with automatic fault tree generation , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[6]  Santosh K. Shrivastava,et al.  The Design and Implementation of Arjuna , 1995, Comput. Syst..

[7]  B. Randell,et al.  Using Coordinated Atomic Actions to Design Complex Safety-critical Systems: the Production Cell Case Study , 1997 .

[8]  Jie Xu,et al.  Coordinated exception handling in distributed object systems: from model to system implementation , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[9]  Rogério de Lemos,et al.  Coordinated atomic actions in modelling object cooperation , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[10]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[11]  Claus Lewerentz,et al.  Formal Development of Reactive Systems: Case Study Production Cell , 1995 .

[12]  Brian Randell,et al.  Formalization of the CA Action Concept Based on Temporal Logic , 1998 .

[13]  Brian Randell,et al.  COALA - A Formal Language for Coordinated Atomic Actions , 1998 .

[14]  Elizabeth L. White,et al.  Application of dynamic reconfiguration in the design of fault tolerant production systems , 1998, Proceedings. Fourth International Conference on Configurable Distributed Systems (Cat. No.98EX159).

[15]  K. H. Kim,et al.  Approaches to Mechanization of the Conversation Scheme Based on Monitors , 1982, IEEE Transactions on Software Engineering.