Semi-automated debugging via binary search through a process lifetime

A common programmer experience is to execute a long-running computation only to see a bug crash the program after hours or days. While it is often easy to capture a "buggy" expression value at the point of the crash, it is less easy to discover the point in the program where the expression became buggy. For such "difficult" bugs, this work presents an automated tool based on binary search through a process lifetime. The tool operates both in single-threaded and multi-threaded program. The underlying algorithm depends on on checkpoints, deterministic replay, and decomposition of debugging histories. The tool is scalable in the sense that the running time is a small constant factor beyond the standalone running time. Further, it requires only a logarithmic number of probes of the expression value --- an advantage when the time to execute the expression is large. The algorithm is demonstrated for such real-world programs as MySQL.

[1]  Gene Cooperman,et al.  DMTCP: Transparent checkpointing for cluster computations and the desktop , 2007, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  Samuel T. King,et al.  Debugging Operating Systems with Time-Traveling Virtual Machines (Awarded General Track Best Paper Award!) , 2005, USENIX Annual Technical Conference, General Track.

[3]  Brandon Lucia,et al.  DMP: Deterministic Shared-Memory Multiprocessing , 2010, IEEE Micro.

[4]  Emery D. Berger,et al.  Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA '09.

[5]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[6]  Marc Vertes,et al.  Fault Tolerance in Multiprocessor Systems Via Application Cloning , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[7]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[8]  Douglas Thain,et al.  Multiple Bypass: Interposition Agents for Distributed Computing , 2001, Cluster Computing.

[9]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[10]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[11]  Josep Torrellas,et al.  Capo: a software-hardware interface for practical deterministic multiprocessor replay , 2009, ASPLOS.

[12]  Marvin V. Zelkowitz Reversible execution , 1973, CACM.

[13]  Ralph Grishman,et al.  The debugging system AIDS , 1899, AFIPS '70 (Spring).

[14]  Satish Narayanasamy,et al.  Offline symbolic analysis for multi-processor execution replay , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Éric Tanter,et al.  Processing , 1988 .

[16]  Gene Cooperman,et al.  Temporal Debugging using URDB , 2009, ArXiv.

[17]  Koen De Bosschere,et al.  RecPlay: a fully integrated practical record/replay system , 1999, TOCS.

[18]  Xiangyu Zhang,et al.  Analyzing multicore dumps to facilitate concurrency bug reproduction , 2010, ASPLOS XV.

[19]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[20]  Andrew Warfield,et al.  Tralfamadore: unifying source code and execution experience , 2009, EuroSys '09.

[21]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[22]  W. Marsden I and J , 2012 .

[23]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[24]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[25]  George Candea,et al.  Execution synthesis: a technique for automated software debugging , 2010, EuroSys '10.

[26]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[27]  Stuart I. Feldman,et al.  IGOR: a system for program debugging via reversible execution , 1988, PADD '88.

[28]  Brandon Lucia,et al.  DMP: Deterministic Shared-Memory Multiprocessing , 2009, IEEE Micro.

[29]  X. Leroy The Objective Caml system release 3.09 Documentation and user''s manual , 2005 .

[30]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[31]  B. Myers Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior , 2008 .

[32]  Naveen Kumar,et al.  Transparent debugging of dynamically instrumented programs , 2005, CARN.

[33]  Milo M. K. Martin,et al.  SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[34]  Miron Livny,et al.  Process hijacking , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[35]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[36]  R. Bodík,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[37]  Bob Boothe Efficient algorithms for bidirectional debugging , 2000, PLDI '00.

[38]  Gene Cooperman,et al.  URDB: a universal reversible debugger based on decomposing debugging histories , 2011, PLOS '11.

[39]  Yasushi Saito,et al.  Jockey: a user-space library for record-replay debugging , 2005, AADEBUG'05.

[40]  Jason Nieh,et al.  Transparent, lightweight application execution replay on commodity multiprocessor operating systems , 2010, SIGMETRICS '10.

[41]  Andrew W. Appel,et al.  Debugging standard ML without reverse engineering , 1990, LISP and Functional Programming.