DeWiz - Event-Based Debugging on the Grid

Debugging tries to locate the reason for incorrect program behavior by analyzing the states occurring during a program's execution. Since the amount of state data affects the debugging process, dedicated analysis functionality must be provided for grid applications, which may process enormous amounts of application data over long execution times. This problem is addressed by the DeWiz tool with an extensible set of event-based analysis modules. The processing operates on an event graph model of the target program's behavior. The desired debugging tasks are specified by arranging and connecting the different DeWiz modules. Due to its selectable abstraction level and its universal applicability, it is suited for parallel and distributed programs. By exploiting the grid for time-consuming analysis tasks in distinct modules, even large amounts of state data can be processed and investigated. This allows to apply debugging activities to large scale grid applications, which are the most challenging targets.

[1]  Jack C. Wileden,et al.  High-level debugging of distributed systems: The behavioral abstraction approach , 1983, J. Syst. Softw..

[2]  Péter Kacsuk,et al.  Application Monitoring in the Grid with GRM and PROVE , 2001, International Conference on Computational Science.

[3]  Bernard Tourancheau,et al.  The Design of the General Parallel Monitoring System , 1992, Programming Environments for Parallel Computing.

[4]  Interner Bericht VAMPIR: Visualization and Analysis of MPI Resources , 1996 .

[5]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[6]  José C. Cunha,et al.  Fiddle: A Flexible Distributed Debugging Architecture , 2001, International Conference on Computational Science.

[7]  Michael Allen,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .

[8]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[9]  Vijay K. Garg,et al.  Predicate control for active debugging of distributed programs , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[10]  Dieter Kranzlmuller,et al.  Event Graph Analysis for Debugging Massively Parallel Programs , 2000 .

[11]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[12]  Jacques Chassin de Kergommeaux,et al.  Pajé: An Extensible Environment for Visualizing Multi-threaded Programs Executions , 2000, Euro-Par.

[13]  Cherri M. Pancake,et al.  What users need in parallel tool support: survey results and analysis , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[14]  Vaidy S. Sunderam,et al.  Extension of Macrostep Debugging Methodology Towards Metacomputing Applications , 2001, International Conference on Computational Science.

[15]  Thomas Ludwig,et al.  OMIS 2.0 - A Universal Interface for Monitoring Systems , 1997, PVM/MPI.

[16]  Jan Bækgaard Pedersen,et al.  Correcting Errors in Message Passing Systems , 2001, HIPS.

[17]  William E. Johnston,et al.  The NetLogger methodology for high performance distributed systems performance analysis , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[18]  Henryk Krawczyk,et al.  Analysis and Testing of Distributed Software Applications , 1998 .

[19]  Dieter Kranzlmüller,et al.  NOPE: A Nondeterministic Program Evaluator , 1999, ACPC.

[20]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.