BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging

Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the developers working in their execution environment to deterministically replay the last several million instructions executed before the crash. BugNet is based on the insight that recording the register file contents at any point in time, and then recording the load values that occur after that point can enable deterministic replaying of a programýs execution. BugNet focuses on being able to replay the applicationýs execution and the libraries it uses, but not the operating system. But our approach provides the ability to replay an applicationýs execution across context switches and interrupts. Hence, BugNet obviates the need for tracking program I/O, interrupts and DMA transfers, which would have otherwise required more complex hardware support. In addition, BugNet does not require a final core dump of the system state for replaying, which significantly reduces the amount of data that must be sent back to the developer.

[1]  William R. Dieter,et al.  A user-level checkpointing library for POSIX threads programs , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[2]  Koen De Bosschere,et al.  Non-intrusive on-the-fly data race detection using execution replay , 2000, AADEBUG.

[3]  Wei Liu,et al.  iWatcher: efficient architectural support for software debugging , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[4]  Amir Roth,et al.  Low-overhead interactive debugging via dynamic instrumentation with DISE , 2005, 11th International Symposium on High-Performance Computer Architecture.

[5]  Jong-Deok Choi,et al.  Deterministic replay of Java multithreaded applications , 1998, SPDT '98.

[6]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[7]  Yuanyuan Zhou,et al.  SafeMem: exploiting ECC-memory for detecting memory leaks and memory corruption during production runs , 2005, 11th International Symposium on High-Performance Computer Architecture.

[8]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[9]  Martin Burtscher,et al.  Automatic generation of high-performance trace compressors , 2005, International Symposium on Code Generation and Optimization.

[10]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[11]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[12]  Josep Torrellas,et al.  ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes , 2003, ISCA '03.

[13]  Seth Copen Goldstein,et al.  Hardware-assisted replay of multiprocessor programs , 1991, PADD '91.

[14]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[15]  Milo M. K. Martin,et al.  SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[16]  Koen De Bosschere,et al.  Proceedings of the Fifth International Workshop on Automated Debugging (AADEBUG 2003) , 2003 .

[17]  Robert H. B. Netzer Optimal tracing and replay for debugging shared-memory parallel programs , 1993, PADD '93.

[18]  Jun Yang,et al.  Energy efficient Frequent Value data Cache design , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[19]  Josep Torrellas,et al.  ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.

[20]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[21]  Bil Lewis,et al.  Debugging Backwards in Time , 2003, ArXiv.

[22]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[23]  Wei Liu,et al.  AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[24]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.