IPMI-based Efficient Notification Framework for Large Scale Cluster Computing

The demand for an efficient Jhult tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and event management. The increasing level of details at the hardware and software layer clearly afects the scalability and peijbrmance of monitoring and management tools. In this paper, we propose a problem notiJication framework that directly addresses the issue of monitor scalability. We first present the design and inzpIementation of our step-by-step approach to analyzing, filtering, and clas,slfiing the plethora of node statistics. Then, we present experimental results to show that our approach only needs minimal system resource and thus has low overhead. Finally, we introduce our web-based cluster management system that provides hardware controls at both cluster and nodal levels.

[1]  Rajkumar Buyya,et al.  PARMON: a portable and scalable monitoring system for clusters , 2000 .

[2]  David E. Culler,et al.  Wide area cluster monitoring with Ganglia , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[3]  Cho-Li Wang,et al.  ClusterProbe: an open, flexible and scalable cluster monitoring tool , 1999, ICWC 99. IEEE Computer Society International Workshop on Cluster Computing.

[4]  Tau Leng,et al.  OSCAR: A Turnkey Solution for Cluster Computing , 2000 .

[5]  Moreno Marzolla,et al.  A performance monitoring system for large computing clusters , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[6]  Ronald Minnich,et al.  Supermon: a high-speed cluster monitoring system , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[7]  Philip M. Papadopoulos,et al.  NPACI Rocks: tools and techniques for easily deploying manageable Linux clusters , 2003, Concurr. Comput. Pract. Exp..

[8]  Putchong Uthayopas,et al.  SCE: A Fully Integrated Software Tool for Beowulf Cluster System , 2001 .