Enabling tracing Of long-running multithreaded programs via dynamic execution reduction

Debugging long running multithreaded programs is a very challenging problem when using tracing-based analyses. Since such programs are non-deterministic, reproducing the bug is non-trivial and generating and inspecting traces for long running programs can be prohibitively expensive. We propose a framework in which, to overcome the problem of bug reproducibility, a lightweight logging technique is used to log the events during the original execution. When a bug is encountered, it is reproduced using the generated log and during the replay, a fine-grained tracing technique is employed to collect control-flow/dependence traces that are then used to locate the root cause of the bug. In this paper, we address the key challenges resulting due to tracing, that is, the prohibitively high expense of collecting traces and the significant burden on the user who must examine the large amount of trace information to locate the bug in a long-running multithreaded program. These challenges are addressed through execution reduction that realizes a combination of logging and tracing such that traces collected contain only the execution information from those regions of threads that are relevant to the fault. This approach is highly effective because we observe that for long running multithreaded programs, many threads that execute are irrelevant to the fault. Hence, these threads need not be replayed and traced when trying to reproduce the bug. We develop a novel lightweight scheme that identifies such threads by observing all the interthread data dependences and removes their execution footprint in the replay run. In addition, we identify regions of thread executions that need not be replayed or, if they must be replayed, we determine if they need not be traced. Following execution reduction, the replayed execution takes lesser time to run and it produces a much smaller trace than the original execution. Thus, the cost of collecting traces and the effort of examining the traces to locate the fault are greatly reduced.

[1]  Larry D. Wittie Debugging distributed C programs by real time reply , 1988, PADD '88.

[2]  Mark A. Linton,et al.  Supporting reverse execution for parallel programs , 1988, PADD '88.

[3]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[4]  Robert H. B. Netzer,et al.  Optimal tracing and incremental reexecution for debugging long-running programs , 1994, PLDI '94.

[5]  Tibor Gyimóthy,et al.  An efficient relevant slicing method for debugging , 1999, ESEC/FSE-7.

[6]  Martin C. Rinard,et al.  Pointer analysis for multithreaded programs , 1999, PLDI '99.

[7]  Koen De Bosschere,et al.  Execution replay and debugging , 2000, AADEBUG.

[8]  Martin C. Rinard,et al.  Pointer and escape analysis for multithreaded programs , 2001, PPoPP '01.

[9]  Koen De Bosschere,et al.  Record/replay for nondeterministic program executions , 2003, CACM.

[10]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[11]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[12]  Xiangyu Zhang,et al.  Whole Execution Traces , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[13]  Daniel M. Roy,et al.  Enhancing Server Availability and Security Through Failure-Oblivious Computing , 2004, OSDI.

[14]  Xiangyu Zhang,et al.  Locating faulty code using failure-inducing chops , 2005, ASE.

[15]  Yuanyuan Zhou,et al.  Rx: treating bugs as allergies---a safe method to survive software failures , 2005, SOSP '05.

[16]  Satish Narayanasamy,et al.  BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging , 2005, ISCA 2005.

[17]  Yasushi Saito,et al.  Jockey: a user-space library for record-replay debugging , 2005, AADEBUG'05.

[18]  Yuanyuan Zhou,et al.  BugBench: Benchmarks for Evaluating Bug Detection Tools , 2005 .

[19]  Satish Narayanasamy,et al.  Recording shared memory dependencies using strata , 2006, ASPLOS XII.

[20]  Xiangyu Zhang,et al.  Dynamic slicing long running programs through execution fast forwarding , 2006, SIGSOFT '06/FSE-14.

[21]  Min Xu,et al.  A regulated transitive reduction (RTR) for longer memory race recording , 2006, ASPLOS XII.

[22]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.