Global events and global breakpoints in distributed systems

A solution to the problem of setting breakpoints in distributed systems is described. It is shown what kind of breakpoints are possible, how to detect those breakpoints, and how to halt the system in a consistent state. The communication among the processes may be asynchronous with an arbitrary ordering of messages. The algorithms select between simultaneous events and those ordered according to L. Lamport's happened-before relation (1978). The mechanisms for definition and detection of global breakpoints are implemented in the distributed debugging system DTM (distributed test methodology).<<ETX>>

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  J. Nievergelt,et al.  Special Feature: Monitoring Program Execution: A Survey , 1981, Computer.

[3]  Bernhard Plattner,et al.  Monitoring Program Execution: A Survey. , 1981 .

[4]  Larry D. Wittie,et al.  BUGNET: A Debugging system for parallel programming environments , 1982, ICDCS.

[5]  M. G. Smith,et al.  A Distributed System Experimentation Facility , 1982, ICDCS.

[6]  Jack C. Wileden,et al.  High-level debugging of distributed systems: The behavioral abstraction approach , 1983, J. Syst. Softw..

[7]  Hector Garcia-Molina,et al.  Debugging a Distributed Computing System , 1984, IEEE Transactions on Software Engineering.

[8]  Roger King,et al.  IDD: An Interactive Distributed Debugger , 1985, ICDCS.

[9]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[10]  Edward T. Smith A debugger for message‐based processes , 1985, Softw. Pract. Exp..

[11]  Barton P. Miller,et al.  A distributed programs monitor for berkeley UNIX , 1985, Softw. Pract. Exp..

[12]  Friedemann Mattern,et al.  Key Concepts of the INCAS Multicomputer Project , 1987, IEEE Transactions on Software Engineering.

[13]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[14]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[15]  Jong-Deok Choi,et al.  Breakpoints and halting in distributed programs , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[16]  D. Haban,et al.  Monitoring and performance measuring distributed systems during operation , 1988, SIGMETRICS 1988.