DIMVisual: Data Integration Model for Visualization of Parallel Programs Behavior

The development of high performance parallel applications for clusters is considered a complex task. This can happen because the influence of the execution environment and the non-deterministic natural behavior of this kind of applications. In such development, the programmer uses application traces and cluster monitoring tools to register the events of the application and the underlying execution environment. Generally, the analysis of the information from each source is made independently, making the correlation of events from the application with events from the execution environment difficult. This paper presents DIMVisual, a Data Integration Model which addresses this problem by integrating information from different sources and providing a unified visualization. An implementation of this model is also presented, using as data sources traces from MPI and DECK applications, events from Ganglia and Performance Co-Pilot cluster monitoring tools and operating system context switches. The results show the information gathered by these data sources integrated and visualized together in the generic visualization tool Paj´e, allowing the programmer a more complete view of his application behavior.

[1]  David E. Culler,et al.  Wide area cluster monitoring with Ganglia , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[2]  César A. F. De Rose,et al.  Improving Performance Analysis Using Resource Management Information , 2003, HiPC.

[3]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[4]  Christoph Steigner,et al.  Performance tuning of distributed applications with CoSMoS , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[5]  Philippe O. A. Navaux,et al.  DECK: A new model for a distributed executive kernel integrating communication and multithreading for support of distributed object oriented application with fault tolerance support , 1998 .

[6]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[7]  Eric Maillet,et al.  On Efficiently Implementing Global Time for Performance Evaluation on Multiprocessor Systems , 1995, J. Parallel Distributed Comput..

[8]  Lucas Mello Schnorr,et al.  JRastro: a trace agent for debugging multithreaded and distributed Java programs , 2003, Proceedings. 15th Symposium on Computer Architecture and High Performance Computing.

[9]  Mukesh Singhal,et al.  Logical Time: Capturing Causality in Distributed Systems , 1996, Computer.

[10]  Robert D. Russell,et al.  Fast Kernel Tracing: A Performance Evaluation Tool For Linux , 2001 .

[11]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[12]  William Gropp,et al.  Toward Scalable Performance Visualization with Jumpshot , 1999, Int. J. High Perform. Comput. Appl..

[13]  Jacques Chassin de Kergommeaux,et al.  Flexible performance visualization of parallel and distributed applications , 2003, Future Gener. Comput. Syst..