Case study: visual debugging of cluster hardware

This paper presents a novel use of visualization applied to debugging the Cplant/sup TM/ cluster hardware at Sandia National Laboratories. As commodity cluster systems grow in popularity and grow in size, tracking component failures within the hardware will become more and more difficult. We have developed a tool that facilitates visual debugging of errors within the switches and cables connecting the processors. Combining an abstract system model with color-coding for both error and job information enables failing components to be identified.

[1]  D Kranzlmüller,et al.  Debugging with the MAD Environment , 1997, Parallel Comput..

[2]  Patricia Crossno,et al.  Visual debugging of visualization software: a case study for particle systems , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[3]  James C. Browne,et al.  Visual programming and debugging for parallel computing , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[4]  Jussi Myllymaki,et al.  Integrated Visualization of Parallel Program Performance Data , 1997, Parallel Comput..

[5]  Tom Hintz,et al.  The Role of Graphics in Parallel Program Development , 1999, J. Vis. Lang. Comput..

[6]  Alan H. Karp,et al.  On-the-fly visualization and debugging of parallel programs , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[7]  Gordon Stoll,et al.  Performance analysis and visualization of parallel systems using SimOS and Rivet: a case study , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).