A performance monitoring system for large computing clusters

In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters of workstations; a prototype implementation of the architecture is also presented. PerfMC is driven by an XML configuration file, and uses the Simple Network Management Protocol (SNMP) to collect statistics from each networked equipment. The collected data are maintained on the local disk of the monitoring station in a compact format, and various graphical and statistical analyses can be performed off-line. The monitoring tool embeds an HTTP server which is able to generate various types of graphs from the collected data. Moreover the HTTP server can generate arbitrary XML pages by dynamically applying XSLT stylesheets to an internal XML representation of the cluster's status. The heavy use of XML-based technologies differentiates the proposed approach to traditional monitoring tools.

[1]  Thomas Ludwig,et al.  OCM—a monitoring system for interoperable tools , 1998, SPDT '98.

[2]  José A. B. Fortes,et al.  A scalable SNMP-based distributed monitoring system for heterogeneous network computing , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[3]  Brian Tierney,et al.  A Monitoring Sensor Management System for Grid Environments , 2004, Cluster Computing.

[4]  Bertil Folliot,et al.  PHOENIX: A Self Adaptable Monitoring Platform for Cluster Management , 2004, Cluster Computing.

[5]  Rajkumar Buyya,et al.  PARMON: a portable and scalable monitoring system for clusters , 2000, Softw. Pract. Exp..

[6]  Ronald Minnich Supermon: High-Performance Monitoring for Linux Clusters , 2001, Annual Linux Showcase & Conference.

[7]  Rajkumar Buyya,et al.  PARMON: a portable and scalable monitoring system for clusters , 2000 .

[8]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[9]  David M. Beazley,et al.  SWILL: A Simple Embedded Web Server Library , 2002, USENIX Annual Technical Conference, FREENIX Track.

[10]  Antonio Puliafito,et al.  Monitoring performance in distributed systems , 1996, Comput. Commun..

[11]  William Stallings,et al.  SNMP, SNMPv2, SNMPv3, and RMON 1 and 2 , 1999 .

[12]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[13]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[14]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[15]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[16]  Antonio Puliafito,et al.  Using mobile agents to implement flexible network management strategies , 2000, Comput. Commun..

[17]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[18]  Andrea Clematis,et al.  Proceedings Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[19]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[20]  A. King,et al.  Protocols and architecture for managing TCP/IP network infrastructures , 2000, Comput. Commun..

[21]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[22]  Putchong Uthayopas,et al.  Fast and Scalable Real-Time Monitoring System for Beowulf Clusters , 2001, PVM/MPI.