Protocol for groups of pseudo-active replicated objects

Distributed applications are realized by the cooperation of multiple objects based on client-server style communication. Server objects are replicated on multiple computers to achieve fault tolerance. In active replication, all the replicated server objects (replicas) receive the same requests in the same order from the client objects, invoke the same operations (methods) and send back responses. These replicas might be placed on different kinds of computers with different processing speeds. In addition, these computers might be connected to different networks, i.e. replicas might be distributed in a WAN. To apply active replication to such a heterogeneous environment, this paper proposes a pseudo-active replication where a client object receives only the first response from the replicas. In order to reduce the recovery time due to the difference in processing speeds among the replicas, two techniques are introduced. One is to detect the fastest replica and the other is for the slower replicas to catch up with the fastest one; requests for identity and idempotent operations (methods) are not invoiced in the slower replicas. Furthermore, in order to reduce the response time for requests from client objects, requests for compatible operations (methods) are invoiced in different order in the replicas. The order is decided based on the round-trip time between client objects and replicas for supporting WAN environments. These are realized by piggybacking some additional information with the control messages for the ordering protocol, i.e. no additional message is required in the proposed protocol.

[1]  Partha Dasgupta,et al.  Fault Tolerant Computing in Object Based Distributed Operating Systems , 1987, SRDS.

[2]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[3]  H. Higaki,et al.  Fault-Tolerant Object by Group-to-Group Communications in Distributed Systems , 1993 .

[4]  Makoto Takizawa,et al.  Protocol for pseudo-active replication in wide-area network , 1999, Proceedings of the 1999 ICPP Workshops on Collaboration and Mobile Computing (CMC'99). Group Communications (IWGC). Internet '99 (IWI'99). Industrial Applications on Network Computing (INDAP). Multime.

[5]  Makoto Takizawa,et al.  Flexible wide-area group communication protocols-international experiments , 1998, Proceedings of the 1998 ICPP Workshop on Architectural and OS Support for Multimedia Applications Flexible Communication Systems. Wireless Networks and Mobile Computing (Cat. No.98EX206).

[6]  David Powell,et al.  Fault-tolerance in Delta-4 , 1991, OPSR.

[7]  Paulo Veríssimo,et al.  The Delta-4 extra performance architecture (XPA) , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[8]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.