A debugger for distributed programs

Developing a distributed debugger is much more complex than developing a sequential debugger. This added complexity is mainly due to the non‐determinism of events that communication delays introduce into distributed systems. We explore the problems that one must address when designing a distributed program debugger and then describe our design and implementation of DPD (distributed program debugger). Problems addressed include non‐determinism of events, finding consistent system states, setting breakpoints, recording events, and checkpointing. Important features of DPD include dynamic roll back and replay, as well as a graphical user interface. DPD has been tested successfully in debugging distributed programs within a distributed facility called REM (remote execution manager).

[1]  Richard J. LeBlanc,et al.  Event-Driven Monitoring of Distributed Programs , 1985, ICDCS.

[2]  Jong-Deok Choi,et al.  A mechanism for efficient debugging of parallel programs , 1988, PADD '88.

[3]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[4]  Richard L. Wexelbalt Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging , 1988 .

[5]  Steven S. Muchnick,et al.  Dbxtool: A window‐based symbolic debugger for sun workstations , 1986, Softw. Pract. Exp..

[6]  David M. Cohen,et al.  The IC* system for debugging parallel programs via interactive monitoring and control , 1988, PADD '88.

[7]  I. J. P. Elshoff,et al.  A distributed debugger for Amoeba , 1988, PADD '88.

[8]  Roger King,et al.  IDD: An Interactive Distributed Debugger , 1985, ICDCS.

[9]  Larry D. Wittie,et al.  BUGNET: A Debugging system for parallel programming environments , 1982, ICDCS.

[10]  David B. Johnson,et al.  Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.

[11]  Wing-Hong Cheung Event abstraction for debugging distributed programs , 1990, IEEE TENCON'90: 1990 IEEE Region 10 Conference on Computer and Communication Systems. Conference Proceedings.

[12]  David L. Russell,et al.  State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.

[13]  David B. Johnsonandwillyzwaenepoel Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1990 .

[14]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[15]  Jack C. Wileden,et al.  High-level debugging of distributed systems: The behavioral abstraction approach , 1983, J. Syst. Softw..

[16]  Janice M. Stone A graphical representation of concurrent processes , 1988, PADD '88.

[17]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[18]  Barton P. Miller,et al.  A distributed programs monitor for berkeley UNIX , 1985, Softw. Pract. Exp..

[19]  Hector Garcia-Molina,et al.  Debugging a Distributed Computing System , 1984, IEEE Transactions on Software Engineering.

[20]  Madalene Spezialetti,et al.  Efficient Distributed Snapshots , 1986, ICDCS.

[21]  R. Nigel Horspool The Berkeley Unix Environment , 1992 .

[22]  David Notkin,et al.  Voyeur: graphical views of parallel programs , 1988, PADD '88.

[23]  Stuart I. Feldman,et al.  IGOR: a system for program debugging via reversible execution , 1988, PADD '88.

[24]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[25]  Wing Hong Cheung Process and event abstraction for debugging distributed programs , 1989 .

[26]  Gholamali C. Shoja A distributed facility for load sharing and parallel processing among workstations , 1991, J. Syst. Softw..

[27]  Michael Scott Kenniston Debugging the communication behavior of distributed programs in a message-based system , 1986 .

[28]  Edward T. Smith A debugger for message‐based processes , 1985, Softw. Pract. Exp..