GEMS: GOSSIP-ENABLED MONITORING SERVICE FOR HETEROGENEOUS DISTRIBUTED SYSTEMS

Gossip protocols provide a scalable means for detecting failures in heterogeneous distributed systems in an asynchronous manner without the limits associated with group communication. In this paper, we discuss the development and features of a hierarchical Gossip-Enabled Monitoring Service (GEMS), which extends the gossip- style failure detection service to support resource monitoring. By dividing the system into groups of nodes and layers of communication, the GEMS paradigm scales well. Easily extensible, GEMS incorporates facilities for distributing arbitrary system and application-specific data. In this paper we present experiments and analytical projections demonstrating fast response times and low resource utilization requirements, making GEMS a superior solution for resource monitoring issues in distributed computing. Also, we demonstrate the utility of GEMS through the development of a simple dynamic load balancing service for which GEMS forms the information base.

[1]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[2]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[3]  Arif Ghafoor,et al.  Semi-Distributed Load Balancing For Massively Parallel Multicomputer Systems , 1991, IEEE Trans. Software Eng..

[4]  Rajkumar Buyya,et al.  PARMON: a portable and scalable monitoring system for clusters , 2000, Softw. Pract. Exp..

[5]  Anthony P. Reeves,et al.  Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[6]  Alan D. George,et al.  Experimental Analysis of a Gossip-Based Service for Scalable, Distributed Failure Detection and Consensus , 2004, Cluster Computing.

[7]  Rajagopal Subramaniyan GOSSIP-BASED FAILURE DETECTION AND CONSENSUS FOR TERASCALE COMPUTING , 2002 .

[8]  Robbert van Renesse,et al.  Scalable and Secure Resource Location , 2000, HICSS.

[9]  S. Zhou,et al.  A Trace-Driven Simulation Study of Dynamic Load Balancing , 1987, IEEE Trans. Software Eng..

[10]  Francis C. M. Lau,et al.  Nearest-neighbor algorithms for load-balancing in parallel computers , 1995, Concurr. Pract. Exp..

[11]  Alan D. George,et al.  Performance analysis of flat and layered gossip services for failure detection and consensus in scalable heterogeneous clusters , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[12]  Paul G. Spirakis,et al.  A hierarchical adaptive distributed algorithm for load balancing , 2004, J. Parallel Distributed Comput..

[13]  Srinivasan Parthasarathy,et al.  Customized Dynamic Load Balancing for a Network of Workstations , 1997, J. Parallel Distributed Comput..

[14]  Cho-Li Wang,et al.  ClusterProbe: an open, flexible and scalable cluster monitoring tool , 1999, ICWC 99. IEEE Computer Society International Workshop on Cluster Computing.

[15]  Alan D. George,et al.  Simulative performance analysis of gossip failure detection for scalable distributed systems , 2004, Cluster Computing.

[16]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[17]  Jefferson L. Tan,et al.  Cost-efficient load distribution using multicasting , 1999, ICWC 99. IEEE Computer Society International Workshop on Cluster Computing.

[18]  Kang G. Shin,et al.  Load Sharing in Distributed Real-Time Systems with State-Change Broadcasts , 1989, IEEE Trans. Computers.

[19]  Alan D. George,et al.  Gossip-Style Failure Detection and Distributed Consensus for Scalable Heterogeneous Clusters , 2004, Cluster Computing.

[20]  Cauligi S. Raghavendra,et al.  A Dynamic Load-Balancing Policy With a Central Job Dispatcher (LBC) , 1992, IEEE Trans. Software Eng..