A visualization tool for analyzing cluster performance data

This paper describes a unique visualization tool that has been used to analyze performance of the Cplant(tm) clusters at Sandia National Laboratories.As commodity cluster systems grow in size and complexity, understanding performance issues becomes more and more difficult.We have developed a tool that facilitates visual performance analysis within the context of the physical and runtime environment of a system. Combining an abstract system model with color-coding for both performance and job information enables quick fault isolation as well as insight into complex system behavior.

[1]  Rajkumar Buyya,et al.  2001 IEEE International Conference on Cluster Computing , 2001 .

[2]  D Kranzlmüller,et al.  Debugging with the MAD Environment , 1997, Parallel Comput..

[3]  Patricia Crossno,et al.  Visual debugging of visualization software: a case study for particle systems , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[4]  Jussi Myllymaki,et al.  Integrated Visualization of Parallel Program Performance Data , 1997, Parallel Comput..

[5]  Gordon Stoll,et al.  Performance analysis and visualization of parallel systems using SimOS and Rivet: a case study , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[6]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[7]  Chanik Park,et al.  An integrated visualization framework for interprocessor communication using 3-D virtual space , 1997, Proceedings High Performance Computing on the Information Superhighway. HPC Asia '97.

[8]  Michael T. Heath,et al.  Parallel performance visualization: from practice to theory , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[9]  Mariacarla Calzarossa,et al.  Medea: a tool for workload characterization of parallel systems , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[10]  Australia rajkumar PARMON : A Comprehensive Cluster Monitoring System , 1998 .

[11]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[12]  Rajkumar Buyya,et al.  GARDMON: A Java-based Monitoring Tool for Gardens Non-dedicated Cluster Computing System , 1999, PDPTA.