Transparent fault tolerance for corba

Applications are increasingly being programmed using the CORBA distributed object standard. CORBA's Internet Inter-ORB Protocol (IIOP) and its mediating Object Request Broker (ORB) allow CORBA objects to interact, transcending differences in their locations, hardware architectures, operating systems and programming languages. The Eternal system provides the fault tolerance that CORBA lacks. Because typical applications are already quite complex, and because typical application programmers do not have skills in fault tolerance, Eternal provides fault tolerance without requiring the modification of applications, or the modification of complex commercial ORB code. The transparency of Eternal's fault tolerance infrastructure to both the application and the ORB is possible through the use of interception technology. The Eternal Interceptor transparently captures the IIOP messages exchanged between the CORBA objects of the application, and diverts these messages to the Eternal Replication Mechanisms and Logging-Recovery Mechanisms. The Eternal system provides fault tolerance through object replication, with support for active and passive replication, duplicate detection and suppression, state transfer, logging and recovery. The use of the Totem reliable totally-ordered multicast protocol to communicate IIOP messages between replicated objects facilitates replica consistency. Eternal can exploit other multicast group communication protocols, such as the SecureRing secure reliable totally-ordered multicast protocol, to provide support for effective majority voting for CORBA applications. Strong replica consistency is ensured for both passive and active replication, as replicas fail and recover, and as operations are performed that update the states of the replicated objects. Recognizing that most CORBA applications and ORBS employ multithreading, a source of non-determinism, Eternal provides mechanisms to enforce determinism transparently, thereby ensuring replica consistency even for multithreaded applications. Eternal has been deployed on seven different unmodified commercial CORBA ORBS. Unmodified applications triply replicated by Eternal incur a 10% overhead in response time compared to their non-fault-tolerant counterparts. The technology of Eternal forms the basis of the forthcoming standard for Fault-Tolerant CORBA.

[1]  Soraya Bestaoui One solution for the non-determinism problem in the SCEPTRE 2 fault tolerance technique , 1995, Proceedings Seventh Euromicro Workshop on Real-Time Systems.

[2]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[3]  Willy Zwaenepoel,et al.  On the use and implementation of message logging , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[4]  Louise E. Moser,et al.  The SecureRing protocols for securing group communication , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[5]  Priya Narasimhan,et al.  Using Interceptors to Enhance CORBA , 1999, Computer.

[6]  Steve Vinoski,et al.  Advanced CORBA® Programming with C++ , 1999 .

[7]  Devang Shah,et al.  Programming with threads , 1996 .

[8]  Klaus E. Schauser,et al.  User-level operating system extensions based on system call interposition , 1999 .

[9]  Yi-Min Wang,et al.  Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[10]  Silvano Maffeis,et al.  Adding Group Communication and Fault-Tolerance to CORBA , 1995, COOTS.

[11]  Priya Narasimhan,et al.  Consistent Object Replication in the external System , 1998, Theory Pract. Object Syst..

[12]  Priya Narasimhan,et al.  Providing support for survivable CORBA applications with the Immune system , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[13]  R. M. Balzer,et al.  Mediating connectors , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems. Workshops on Electronic Commerce and Web-based Applications. Middleware.

[14]  Robbert van Renesse,et al.  Building adaptive systems using ensemble , 1998 .

[15]  Thomas C. Bressoud,et al.  TFT: a software system for application-transparent fault tolerance , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[16]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[17]  Douglas C. Schmidt,et al.  The design of the TAO real-time object request broker , 1998, Comput. Commun..

[18]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[19]  John R. Levine Linkers and Loaders , 1999 .

[20]  Santosh K. Shrivastava,et al.  The Design and Implementation of Arjuna , 1995, Comput. Syst..

[21]  Jean-Charles Fabre,et al.  A Metaobject Architecture for Fault-Tolerant Distributed Systems: The FRIENDS Approach , 1998, IEEE Trans. Computers.

[22]  Rachid Guerraoui,et al.  The Implementation of a CORBA Object Group Service , 1998, Theory Pract. Object Syst..

[23]  Sampath Rangarajan,et al.  Filterfresh: Hot Replication of Java RMI Server Objects , 1998, COOTS.

[24]  William H. Sanders,et al.  AQuA: an adaptive architecture that provides dependable distributed objects , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[25]  Douglas C. Schmidt,et al.  The Design and Performance of a Pluggable Protocols Framework for Object Request Broker Middleware , 1999 .

[26]  Louise E. Moser,et al.  Survivable distributed systems: design and implementation (common object request broker architecture) , 1999 .

[27]  Louise E. Moser,et al.  Extended virtual synchrony , 1994, 14th International Conference on Distributed Computing Systems.

[28]  Douglas C. Schmidt,et al.  Constructing reliable distributed communication systems with CORBA , 1997, IEEE Commun. Mag..

[29]  Chris J. Scheiman,et al.  UFO: a personal global file system based on user-level extensions to the operating system , 1998, TOCS.

[30]  Kenneth P. Birman,et al.  The Maestro Approach to Building Reliable Interoperable Distributed Applications with Multiple Execution Styles , 1998, Theory Pract. Object Syst..

[31]  Christian Jacquemot,et al.  COOL: the CHORUS CORBA compliant framework , 1994, Proceedings of COMPCON '94.

[32]  Ronald L. Rivest,et al.  The MD4 Message-Digest Algorithm , 1990, RFC.

[33]  Douglas C. Schmidt Evaluating architectures for multithreaded object request brokers , 1998, CACM.

[34]  Priya Narasimhan,et al.  Replica consistency of CORBA objects in partitionable distributed systems , 1997, Distributed Syst. Eng..

[35]  E. N. Elnozahy,et al.  Supporting nondeterministic execution in fault-tolerant systems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[36]  Achour Mostéfaoui,et al.  Efficient Message Logging for Uncoordinated Checkpointing Protocols , 1996, EDCC.

[37]  Brijbhushan Shrikant Sabnis Proteus: A Software Infrastructure Providing Dependability for CORBA Applications , 1999 .

[38]  Priya Narasimhan,et al.  Exploiting the Internet Inter-ORB Protocol Interface to Provide CORBA with Fault Tolerance , 1997, COOTS.

[39]  Timothy W. Curry,et al.  Profiling and Tracing Dynamic Library Usage Via Interposition , 1994, USENIX Summer.

[40]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[41]  Pascal Felber,et al.  THE CORBA OBJECT GROUP SERVICE: A SERVICE APPROACH TO OBJECT GROUPS IN CORBA , 1998 .

[42]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[43]  Yi-Min Wang,et al.  COMERA: COM Extensible Remoting Architecture , 1998, COOTS.

[44]  Priya Narasimhan,et al.  The Interception Approach to Reliable Distributed CORBA Objects , 1997, COOTS.

[45]  Roger Faulkner,et al.  The Process File System and Process Model in UNIX System V , 1991, USENIX Winter.

[46]  Yennun Huang,et al.  A management interface for distributed fault tolerance CORBA services , 1998, Proceedings of the IEEE Third International Workshop on Systems Management.

[47]  Paul D. Ezhilchelvan,et al.  Design and implemantation of a CORBA fault-tolerant object group service , 1999, DAIS.

[48]  Louise E. Moser,et al.  Totem: a fault-tolerant multicast group communication system , 1996, CACM.

[49]  Priya Narasimhan,et al.  Enforcing determinism for the consistent replication of multithreaded CORBA applications , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.