HMM: A Cluster Membership Service

The Hidra Membership Monitor (HMM) is a distributed service that maintains the current set of active nodes in a cluster of machines. This protocol allows the detection of multiple machine joins or failures in a unique reconfiguration, using a low amount of messages (with a cost that is linear on the number of nodes). These membership services are needed to detect cluster changes as soon as possible, initiating then the reconfiguration of the cluster state, where support for replicated objects has been included. The HMM also manages and synchronises the reconfiguration steps needed by the kernel and Hidra components of each node, ensuring that all of them take the same steps at once. Thus, our system does not need an atomic multicast protocol to deliver the messages in these reconfiguration steps. All these services provide the basis to develop reliable intracluster transport protocols and to reduce the reconfiguration time of replicated objects and services.

[1]  Francesc D. Muñoz-Escoí,et al.  A Synchronisation Mechanism for Replicated Objects , 1998, SOFSEM.

[2]  José Rufino,et al.  A low-level processor group membership protocol for LANs , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[3]  Ragunathan Rajkumar,et al.  Processor group membership protocols: specification, design and implementation , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[4]  A. Watson,et al.  OMG (Object Management Group) architecture and CORBA (common object request broker architecture) specification , 2002 .

[5]  Günter Grünsteidl,et al.  TTP - A Protocol for Fault-Tolerant Real-Time Systems , 1994, Computer.

[6]  Francesc D. Muñoz-Escoí,et al.  ROI: an invocation mechanism for replicated objects , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[7]  Francesc D. Muñoz-Escoí,et al.  High Availability Support in CORBA Environments , 1997, SOFSEM.

[8]  Louise E. Moser,et al.  Fast message ordering and membership using a logical token-passing ring , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[9]  Kenneth P. Birman,et al.  Process Membership in Asynchronous Environments , 1993 .

[10]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[11]  Francesc D. Muñoz-Escoí,et al.  Garbage Collection for Modile and Replicated Objects , 1999, SOFSEM.

[12]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .