Coordinated exception handling in distributed object systems: from model to system implementation

Exception handling in concurrent and distributed programs is a difficult task though it is often necessary. In many cases traditional exception mechanisms for sequential programs are no longer appropriate. One major difficulty is that the process of handling an exception may need to involve multiple concurrent components that are cooperating in pursuit of some global goal. Another complication is that several exceptions may be raised concurrently in different nodes of a distributed environment. Existing proposals and actual concurrent languages either ignore these difficulties or only cope with a limited form of them. The paper attempts a general solution, developed especially for distributed object systems, starting from a conceptual model, together with algorithms for coordinating concurrent components and resolving multiple exceptions, through to an actual system implementation. An industrial production cell is chosen as a case study to demonstrate the usefulness of the proposed model and algorithms. A system that supports coordinated atomic actions and exception resolution is implemented in distributed Ada 95 and examined through several performance-related experiments.

[1]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[2]  Michel Riveill,et al.  The Guide Language , 1994, Comput. J..

[3]  Flaviu Cristian,et al.  Exception Handling and Tolerance of Software Faults , 1995 .

[4]  B. Randell,et al.  Using Coordinated Atomic Actions to Design Complex Safety-critical Systems: the Production Cell Case Study , 1997 .

[5]  David J. Taylor Concurrency and forward recovery in atomic actions , 1986, IEEE Transactions on Software Engineering.

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Jie Xu,et al.  Exception handling and resolution in distributed object-oriented systems , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[8]  Valérie Issarny An exception handling mechanism for parallel object-oriented programming , 1992 .

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Alan Burns,et al.  Implementing Atomic Actions in Ada 95 , 1997, IEEE Trans. Software Eng..

[11]  Brian Randell,et al.  Error recovery in asynchronous systems , 1986, IEEE Transactions on Software Engineering.

[12]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[13]  Claus Lewerentz,et al.  Formal Development of Reactive Systems: Case Study Production Cell , 1995 .

[14]  K. H. Kim,et al.  Implementation of the Conversation Scheme in Message-Based Distributed Computer Systems , 1992, IEEE Trans. Parallel Distributed Syst..

[15]  Anand R. Tripathi,et al.  Issues with Exception Handling in Object-Oriented Systems , 1997, ECOOP.

[16]  Roy H. Campbell,et al.  Atomic actions for fault-tolerance using CSP , 1986, IEEE Transactions on Software Engineering.

[17]  Brian Randell,et al.  Coordinated Atomic Actions: from Concept to Implementation , 1997 .

[18]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.