A Taxonomy of Execution Replay Systems

Debugging a faulty program can be very hard and time-consuming. The programmer usually reexecutes his program, while zooming in on the root cause of the bug. However, sometimes bugs seem to appear only intermittently, making it even harder for the programmer to solve them. The main reason for this is that when executing a program, there are numerous non-deterministic events taking place within the computer, which can wreck even a very carefully crafted debugging session. To deal with these ghostly bugs, one needs to remove the non-determinism from the execution and its environment. Therefore various socalled execution replay systems have been devised, each with their merits and limitations. We give an overview of the terminology used when discussing execution replay, of the causes of nondeterminism within a computer, and of the current state of the art in execution replay systems.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  Thomas A. Cargill,et al.  Cheap hardware support for software debugging and profiling , 1987, ASPLOS.

[3]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[4]  Stuart I. Feldman,et al.  IGOR: a system for program debugging via reversible execution , 1988, PADD '88.

[5]  Colin J. Fidge,et al.  Partial orders for parallel debugging , 1988, PADD '88.

[6]  Mark A. Linton,et al.  Supporting reverse execution for parallel programs , 1988, PADD '88.

[7]  A. Dain Samples,et al.  Mache: no-loss trace compaction , 1989, SIGMETRICS '89.

[8]  Thomas J. LeBlanc,et al.  A software instruction counter , 1989, ASPLOS III.

[9]  Robert H. B. Netzer Optimal tracing and replay for debugging shared-memory parallel programs , 1993, PADD '93.

[10]  Luk Levrouw,et al.  Interrupt replay: a debugging method for parallel programs with interrupts , 1994, Microprocess. Microsystems.

[11]  Luk Levrouw,et al.  A New Trace And Replay System For Shared Memory Programs Based On Lamport Clocks , 1994, Proceedings. Second Euromicro Workshop on Parallel and Distributed Processing.

[12]  Robert H. B. Netzer,et al.  Optimal tracing and incremental reexecution for debugging long-running programs , 1994, PLDI '94.

[13]  Eric E. Johnson,et al.  PDATS Lossless Address Trace Compression For Reducing File Size And Access Time , 1994, Proceeding of 13th IEEE Annual International Phoenix Conference on Computers and Communications.

[14]  Mark Russinovich,et al.  Replay for concurrent non-deterministic shared-memory applications , 1996, PLDI '96.

[15]  Jong-Deok Choi,et al.  Deterministic replay of Java multithreaded applications , 1998, SPDT '98.

[16]  Mark Minas Cyclic Debugging for pSather, a Parallel Object-Oriented Programming Language , 1998 .

[17]  Koen De Bosschere,et al.  RecPlay: a fully integrated practical record/replay system , 1999, TOCS.

[18]  Jong-Deok Choi,et al.  Deterministic replay of distributed Java applications , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[19]  John Steven,et al.  jRapture: A Capture/Replay tool for observation-based testing , 2000, ISSTA '00.

[20]  Michiel Ronsse,et al.  JiTI: a robust just in time instrumentation technique , 2001, CARN.

[21]  Jong-Deok Choi,et al.  A perturbation-free replay platform for cross-optimized multithreaded applications , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[22]  Eric E. Johnson,et al.  Lossless Trace Compression , 2001, IEEE Trans. Computers.

[23]  Koen De Bosschere,et al.  DIOTA: Dynamic Instrumentation, Optimization and Transformation of Applications , 2002, PACT 2002.

[24]  Koen De Bosschere,et al.  TORNADO: A Novel Input Replay Tool , 2003, PDPTA.

[25]  Koen De Bosschere,et al.  JaRec: a portable record/replay environment for multi‐threaded Java applications , 2004, Softw. Pract. Exp..