AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects

Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach; its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects. The AQuA architecture allows application programmers to request desired levels of dependability during applications' runtimes. It provides fault tolerance mechanisms to ensure that a CORBA client can always obtain reliable services, even if the CORBA server object that provides the desired services suffers from crash failures and value faults. AQuA includes a replicated dependability manager that provides dependability management by configuring the system in response to applications' requests and changes in system resources due to faults. It uses Maestro/Ensemble to provide group communication services. It contains a gateway to intercept standard CORBA IIOP messages to allow any standard CORBA application to use AQuA. It provides different types of replication schemes to forward messages reliably to the remote replicated objects. All of the replication schemes ensure strong, data consistency among replicas. This paper describes the AQuA architecture and presents, in detail, the active replication pass-first scheme. In addition, the interface to the dependability manager and the design of the dependability manager replication are also described. Finally, we describe performance measurements that were conducted for the active replication pass-first scheme, and we present results from our study of fault detection, recovery, and blocking times.

[1]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[2]  Ravishankar K. Iyer,et al.  Chameleon: A Software Infrastructure for Adaptive Fault Tolerance , 1999, IEEE Trans. Parallel Distributed Syst..

[3]  Paul D. Ezhilchelvan,et al.  Design and implemantation of a CORBA fault-tolerant object group service , 1999, DAIS.

[4]  Sean Landis,et al.  Building Reliable Distributed Systems with CORBA , 1997, Theory Pract. Object Syst..

[5]  Danny Dolev,et al.  The Transis approach to high availability cluster communication , 1996, CACM.

[6]  Priya Narasimhan,et al.  Consistent Object Replication in the external System , 1998, Theory Pract. Object Syst..

[7]  Priya Narasimhan,et al.  Providing support for survivable CORBA applications with the Immune system , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[8]  Didier Rémy,et al.  Objective ML: An Effective Object-Oriented Extension to ML , 1998, Theory Pract. Object Syst..

[9]  P. Verissimo,et al.  Replicated object management using group technology , 1993, 1993 4th Workshop on Future Trends of Distributed Computing Systems.

[10]  Jari Koistinen,et al.  Dimensions for Reliability Contracts in Distributed Object Systems , 1997 .

[11]  Yansong Ren,et al.  AQuA: A Framework for Providing Adaptive Fault Tolerance to Distributed Applications , 2001 .

[12]  Priya Narasimhan,et al.  Using Interceptors to Enhance CORBA , 1999, Computer.

[13]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[14]  John A. Zinky,et al.  An object-level gateway supporting integrated-property quality of service , 1999, Proceedings 2nd IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'99) (Cat. No.99-61702).

[15]  Alexey Vaysburd Ken Birman Building Reliable Adaptive Distributed Objects with the Maestro Tools , 1997 .

[16]  John A. Zinky,et al.  QoS Aspect Languages and Their Runtime Integration , 1998, LCR.

[17]  John A. Zinky,et al.  Architectural Support for Quality of Service for CORBA Objects , 1997, Theory Pract. Object Syst..

[18]  Priya Narasimhan,et al.  Gateways for Accessing Fault Tolerance Domains , 2000, Middleware.

[19]  Hermann Kopetz,et al.  Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.

[20]  Silvano Maffeis Prianha: A CORBA Tool For High Availability , 1997, Computer.

[21]  Mark Garland Hayden,et al.  The Ensemble System , 1998 .

[22]  Moorsel A Van,et al.  Design of a Resource Manager for Fault-Tolerant CORBA , 1999 .

[23]  Ravishankar K. Iyer,et al.  The Chameleon infrastructure for adaptive, software implemented fault tolerance , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[24]  Kenneth P. Birman,et al.  The Maestro Approach to Building Reliable Interoperable Distributed Applications with Multiple Execution Styles , 1998, Theory Pract. Object Syst..

[25]  Priya Narasimhan,et al.  Replica consistency of CORBA objects in partitionable distributed systems , 1997, Distributed Syst. Eng..

[26]  Matti A. Hiltunen,et al.  Coyote: a system for constructing fine-grain configurable communication services , 1998, TOCS.

[27]  Aniruddha S. Gokhale,et al.  DOORS: towards high-performance fault tolerant CORBA , 2000, Proceedings DOA'00. International Symposium on Distributed Objects and Applications.

[28]  William H. Sanders,et al.  An Adaptive Algorithm for Tolerating Value Faults and Crash Failures , 2001, IEEE Trans. Parallel Distributed Syst..

[29]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[30]  David A. Karr Specification, composition, and automated verification of layered communication protocols , 1997 .

[31]  Jean-Charles Fabre,et al.  A Metaobject Architecture for Fault-Tolerant Distributed Systems: The FRIENDS Approach , 1998, IEEE Trans. Computers.

[32]  William H. Sanders,et al.  Proteus: a flexible infrastructure to implement adaptive fault tolerance in AQuA , 1999, Dependable Computing for Critical Applications 7.

[33]  Priya Narasimhan,et al.  State synchronization and recovery for strongly consistent replicated CORBA objects , 2001, 2001 International Conference on Dependable Systems and Networks.

[34]  Louise E. Moser,et al.  The Totem system , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[35]  Rachid Guerraoui,et al.  The design of a CORBA group communication service , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.

[36]  John A. Zinky,et al.  Specifying and measuring quality of service in distributed object systems , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[37]  Michael K. Reiter,et al.  Distributing trust with the Rampart toolkit , 1996, CACM.

[38]  Santosh K. Shrivastava,et al.  Java transactions for the Internet , 1998, Distributed Syst. Eng..

[39]  H. Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992, Dependable Computing and Fault-Tolerant Systems.

[40]  William H. Sanders,et al.  AQuA: an adaptive architecture that provides dependable distributed objects , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[41]  Paul Rubel,et al.  Passive Replication in the AQuA System , 2000 .

[42]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[43]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .