Event-based Program Analysis with DeWiz

Due to the increased complexity of parallel and distributed programs, debugging of them is considered to be the most difficult and time consuming part of the software lifecycle. Tool support is hence a crucial necessity to hide complexity from the user. However, most existing tools seem inadequate as soon as the program under consideration exploits more than a few processors over a long execution time. This problem is addressed by the noveldebuggingtoolDeWiz (Debugging Wizard), whose focuslies on scalability. DeWiz has a modular, scalable architecture, and uses the event graph model as a representation of the investigated program. DeWiz provides a set of modules, which can be combined to generate, analyze, and visualize event graph data. Within this processing pipeline the toolset tries to extract useful information, which is presented to the user at an arbitrary level of abstraction. Additionally, DeWiz is a framework, which can be used to easily implement arbitrary user-defined modules.

[1]  Bernd Mohr,et al.  EARL - A Programmable and Extensible Toolkit for Analyzing Event Traces of Message Passing Programs , 1999, HPCN Europe.

[2]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[3]  Colin J. Fidge,et al.  Logical time in distributed computing systems , 1991, Computer.

[4]  Bernd Mohr,et al.  Design and Prototype of a Performance Tool Interface for OpenMP , 2002, The Journal of Supercomputing.

[5]  Dieter Kranzlmuller,et al.  Event Graph Analysis for Debugging Massively Parallel Programs , 2000 .

[6]  Rolf Rabenseifner Communication and Optimization Aspects on Hybrid Architectures , 2002, PVM/MPI.

[7]  D Kranzlmüller,et al.  Debugging with the MAD Environment , 1997, Parallel Comput..

[8]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[9]  Barton P. Miller,et al.  Dynamic program instrumentation for scalable performance tools , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[10]  Peter C. Bates,et al.  Debugging heterogeneous distributed systems using event-based models of behavior , 1988, PADD '88.

[11]  Thomas Ludwig,et al.  OCM—a monitoring system for interoperable tools , 1998, SPDT '98.

[12]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[13]  Daniel A. Reed,et al.  Virtual Reality and Parallel Systems Performance Analysis , 1995, Computer.

[14]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[15]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[16]  Peter C. Bates Debugging Heterogeneous Distributed Systems Using Event-Based Models of Behavior , 1995, ACM Trans. Comput. Syst..

[17]  Jack C. Wileden,et al.  High-level debugging of distributed systems: The behavioral abstraction approach , 1983, J. Syst. Softw..