An efficient topology-adaptive membership protocol for large-scale cluster-based services

A highly available large-scale service cluster often requires the system to discover new nodes and identify failed nodes quickly in order to handle a high volume of traffic. Determining node membership promptly in such an environment is critical to location-transparent service invocation, load balancing, and failure shielding. In this paper, we present a topology-adaptive hierarchical membership service which dynamically divides the entire cluster into membership groups based on the network topology among nodes so that the liveness of a node within each group is published to others in a highly efficient manner. The proposed, approach has been compared with two alternatives: an all-to-all multicast approach and a gossip based approach. The results show that the proposed, approach is scalable and effective in terms of high membership accuracy, short view convergence time, and low communication cost.

[1]  Bernadette Charron-Bost,et al.  On the impossibility of group membership , 1996, PODC '96.

[2]  Randy Chow,et al.  Distributed Operating Systems & Algorithms , 1997 .

[3]  GhemawatSanjay,et al.  The Google file system , 2003 .

[4]  Gil Neiger A new look at membership services (extended abstract) , 1996, PODC '96.

[5]  Richard P. Martin,et al.  Using Fault Injection and Modeling to Evaluate the Performability of Cluster-Based Services , 2003, USENIX Symposium on Internet Technologies and Systems.

[6]  Bobby Bhattacharjee,et al.  Scalable application layer multicast , 2002, SIGCOMM '02.

[7]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[8]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[9]  Christof Fetzer,et al.  Enforcing perfect failure detection , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[10]  Peter van der Stok,et al.  A Hierarchical Membership Protocol for Synchronous Distributed Systems , 1994, EDCC.

[11]  Tao Yang,et al.  Dependency isolation for thread-based multi-tier Internet services , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[12]  Tao Yang,et al.  Neptune: Scalable Replication Management and Programming Support for Cluster-based Network Services , 2001, USITS.

[13]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[14]  Flaviu Cristian,et al.  A Highly Available Local Leader Election Service , 1999, IEEE Trans. Software Eng..

[15]  Amit Jain,et al.  Failure detection and membership management in grid environments , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[16]  David E. Culler,et al.  Ninja: A Framework for Network Services , 2002, USENIX Annual Technical Conference, General Track.

[17]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[18]  Anne-Marie Kermarrec,et al.  Peer-to-Peer Membership Management for Gossip-Based Protocols , 2003, IEEE Trans. Computers.

[19]  David E. Culler,et al.  Wide area cluster monitoring with Ganglia , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[20]  Brian N. Bershad,et al.  Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service , 1999, TOCS.

[21]  Tao Yang,et al.  Cluster load balancing for fine-grain network services , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[22]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[23]  Jon M. Kleinberg,et al.  Spatial gossip and resource location protocols , 2001, JACM.