Load-balanced data aggregation tree construction for large scale cluster monitoring system

The cluster monitoring system observes the operation of the system, analyzes the performance data, and displays results. It is crucial for cluster management and performance measurements as the monitoring data can be used to diagnose problems and to suggest remedies by both end users and system administrators. Scalable resource monitoring is essential to the cluster management. This paper proposes a scalable cluster monitoring architecture that builds a structured data aggregation tree(DAT) of master monitoring nodes by using the Chord P2P algorithm. The DAT leverages the Chord topology and routing mechanisms and it is implicitly constructed from native Chord routing paths without previous monitoring nodes membership and topology configuration. To balance the storage space used by monitoring data and computing load of the monitoring node, we propose a balanced routing algorithm that dynamically selects the parent of a node from its finger nodes by its distance to the root. We have evaluated the performance and scalability of our DAT-based monitoring system with up to 2500 nodes in a simulated environment. Our experiments results show that the balanced DAT scheme monitoring system scales well to a large number of nodes. Without explicitly configuring parent-child relationship, it is well adaptive to node arrival and departure and can be easily deployed.

[1]  Rajkumar Buyya,et al.  PARMON: a portable and scalable monitoring system for clusters , 2000, Softw. Pract. Exp..

[2]  Rajkumar Buyya,et al.  PARMON: a portable and scalable monitoring system for clusters , 2000 .

[3]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[4]  Ronald Minnich,et al.  Supermon: a high-speed cluster monitoring system , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[5]  Werner Nutt,et al.  The Relational Grid Monitoring Architecture: Mediating Information about the Grid , 2004, Journal of Grid Computing.

[6]  Cho-Li Wang,et al.  ClusterProbe: an open, flexible and scalable cluster monitoring tool , 1999, ICWC 99. IEEE Computer Society International Workshop on Cluster Computing.

[7]  Dafang Zhang,et al.  A Partition-Based Broadcast Algorithm over DHT for Large-Scale Computing Infrastructures , 2009, GPC.

[8]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[9]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[10]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..