Concurrent Exception Handling and Resolution in Distributed Object Systems

We address the problem of how to handle exceptions in distributed object systems. In a distributed computing environment, exceptions may be raised simultaneously in different processing nodes and thus need to be treated in a coordinated manner. Mishandling concurrent exceptions can lead to catastrophic consequences. We take two kinds of concurrency into account: 1) Several objects are designed collectively and invoked concurrently to achieve a global goal and 2) multiple objects (or object groups) that are designed independently compete for the same system resources. We propose a new distributed algorithm for resolving concurrent exceptions and show that the algorithm works correctly even in complex nested situations, and is an improvement over previous proposals in that it requires only O(n/sub max/N/sup 2/) messages, thereby permitting quicker response to exceptions.

[1]  J. Xu,et al.  Toward an object-oriented approach to software fault tolerance , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.

[2]  Alexander Romanovsky,et al.  Designing Fault-Tolerant Objects in Object-Oriented Programming , 1992, TOOLS.

[3]  Christophe Dony,et al.  Exception handling and object-oriented programming: towards a synthesis , 1990, OOPSLA/ECOOP '90.

[4]  Claus Lewerentz,et al.  Formal Development of Reactive Systems: Case Study Production Cell , 1995 .

[5]  Bertrand Meyer,et al.  Eiffel: The Language , 1991 .

[6]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[7]  Valérie Issarny An exception handling mechanism for parallel object-oriented programming , 1992 .

[8]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[9]  Brian Randell System structure for software fault tolerance , 1975 .

[10]  Claus Lewerentz,et al.  Formal Development of Reactive Systems , 1995, Lecture Notes in Computer Science.

[11]  Lawrence Charles Paulson,et al.  ML for the working programmer , 1991 .

[12]  Brian Randell,et al.  Coordinated exception handling in real-time distributed object systems , 1999 .

[13]  Pankaj Jalote Using Broadcasting for Multiprocess Recovery , 1986, ICDCS.

[14]  Qian Cui,et al.  Data-Oriented Exception Handling , 1992, IEEE Trans. Software Eng..

[15]  K. H. Kim,et al.  Implementation of the Conversation Scheme in Message-Based Distributed Computer Systems , 1992, IEEE Trans. Parallel Distributed Syst..

[16]  Colin Atkinson,et al.  Object-oriented reuse concurrency and distribution , 1991 .

[17]  Butler W. Lampson,et al.  Atomic Transactions , 1980, Advanced Course: Distributed Systems.

[18]  Pattie Maes Concepts and experiments in computational reflection , 1987, OOPSLA 1987.

[19]  Jie Xu,et al.  Exception handling and resolution in distributed object-oriented systems , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[20]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[21]  Lawrence C. Paulson,et al.  ML for the working programmer (2. ed.) , 1996 .

[22]  Colin Atkinson Object-oriented reuse, concurrency and distribution - an Ada-based approach , 1991 .

[23]  Michel Riveill,et al.  The Guide Language , 1994, Comput. J..

[24]  Samuel T. Chanson,et al.  Process groups and group communications: classifications and requirements , 1990, Computer.

[25]  Cecília M. F. Rubira,et al.  Structuring fault-tolerant object-oriented systems using inheritance and delegation , 1994 .

[26]  Flaviu Cristian,et al.  Exception Handling and Tolerance of Software Faults , 1995 .

[27]  Brian Randell,et al.  Error recovery in asynchronous systems , 1986, IEEE Transactions on Software Engineering.

[28]  Takashi Masuda,et al.  Designing an Extensible Distributed Language with a Meta-Level Architecture , 1993, ECOOP.

[29]  Jean-Charles Fabre,et al.  Implementing fault tolerant applications using reflective object-oriented programming , 1995 .

[30]  Avelino Francisco Zorzo,et al.  Rigorous development of a safety-critical system based on coordinated atomic actions , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[31]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[32]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.