Analyzing multicore dumps to facilitate concurrency bug reproduction

Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacerbated in multi-core environments where there are multiple schedulers, one for each core. In this paper, we propose a reproduction technique for concurrent programs that execute on multi-core platforms. Our technique performs a lightweight analysis of a failing execution that occurs in a multi-core environment, and uses the result of the analysis to enable reproduction of the bug in a single-core system, under the control of a deterministic scheduler. More specifically, our approach automatically identifies the execution point in the re-execution that corresponds to the failure point. It does so by analyzing the failure core dump and leveraging a technique called execution indexing that identifies a related point in the re-execution. By generating a core dump at this point, and comparing the differences betwen the two dumps, we are able to guide a search algorithm to efficiently generate a failure inducing schedule. Our experiments show that our technique is highly effective and has reasonable overhead.

[1]  Josep Torrellas,et al.  Capo: a software-hardware interface for practical deterministic multiprocessor replay , 2009, ASPLOS.

[2]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[3]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[4]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[5]  Jong-Deok Choi,et al.  Deterministic replay of Java multithreaded applications , 1998, SPDT '98.

[6]  Koushik Sen,et al.  A randomized dynamic program analysis technique for detecting real deadlocks , 2009, PLDI '09.

[7]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[8]  Alaa R. Alameldeen,et al.  Addressing Workload Variability in Architectural Simulations , 2003, IEEE Micro.

[9]  Xiangyu Zhang,et al.  Cost and precision tradeoffs of dynamic data slicing algorithms , 2005, TOPL.

[10]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[11]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[12]  Robert H. B. Netzer,et al.  Optimal tracing and incremental reexecution for debugging long-running programs , 1994, PLDI '94.

[13]  ZhangYoutao,et al.  Cost and precision tradeoffs of dynamic data slicing algorithms , 2005 .

[14]  Tom Ridge,et al.  The semantics of x86-CC multiprocessor machine code , 2009, POPL '09.

[15]  Yasushi Saito,et al.  Jockey: a user-space library for record-replay debugging , 2005, AADEBUG'05.

[16]  Satish Narayanasamy,et al.  Recording shared memory dependencies using strata , 2006, ASPLOS XII.

[17]  Xuezheng Liu,et al.  Usenix Association 8th Usenix Symposium on Operating Systems Design and Implementation R2: an Application-level Kernel for Record and Replay , 2022 .

[18]  Hans-Juergen Boehm,et al.  Garbage collection in an uncooperative environment , 1988, Softw. Pract. Exp..

[19]  Koen De Bosschere,et al.  Record/replay for nondeterministic program executions , 2003, CACM.

[20]  Mark A. Linton,et al.  Supporting reverse execution for parallel programs , 1988, PADD '88.

[21]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[22]  Madan Musuvathi,et al.  Iterative context bounding for systematic testing of multithreaded programs , 2007, PLDI '07.

[23]  Michael D. Bond,et al.  Probabilistic calling context , 2007, OOPSLA.

[24]  Janusz W. Laski,et al.  Dynamic Program Slicing , 1988, Inf. Process. Lett..

[25]  Xiangyu Zhang,et al.  Efficient program execution indexing , 2008, PLDI '08.

[26]  Samuel T. King,et al.  Debugging Operating Systems with Time-Traveling Virtual Machines (Awarded General Track Best Paper Award!) , 2005, USENIX Annual Technical Conference, General Track.

[27]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[28]  Anant Agarwal,et al.  TraceBack: first fault diagnosis by reconstruction of distributed control flow , 2005, PLDI '05.

[29]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[30]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.