Offline symbolic analysis for multi-processor execution replay

Ability to replay a program's execution on a multi-processor system can significantly help parallel programming. To replay a shared-memory multi-threaded program, existing solutions record its program input (I/O, DMA, etc.) and the shared-memory dependencies between threads. Prior processor based record-and-replay solutions are efficient, but they require non-trivial modifications to the coherency protocol and the memory sub-system for recording the shared-memory dependencies. In this paper, we propose a processor-based record-and-replay solution that does not require detecting and logging shared-memory dependencies to enable multi-processor execution replay. We show that a load-based checkpointing scheme, which was originally proposed for just recording program input, is also sufficient for replaying every thread in a multi-threaded program. Shared-memory dependencies between threads are reconstructed offline, during replay, using an algorithm based on an SMT solver. In addition to saving log space, the proposed solution significantly reduces the complexity of hardware support required for enabling replay.

[1]  Brad Calder,et al.  Reproducible simulation of multi-threaded workloads for architecture design exploration , 2008, 2008 IEEE International Symposium on Workload Characterization.

[2]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[3]  Josep Torrellas,et al.  Capo: a software-hardware interface for practical deterministic multiprocessor replay , 2009, ASPLOS.

[4]  Josep Torrellas,et al.  ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.

[5]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[6]  Bruno Dutertre,et al.  A Fast Linear-Arithmetic Solver for DPLL(T) , 2006, CAV.

[7]  Derek Hower,et al.  Rerun: Exploiting Episodes for Lightweight Memory Race Recording , 2008, 2008 International Symposium on Computer Architecture.

[8]  Satish Narayanasamy,et al.  Patching Processor Design Errors , 2006, 2006 International Conference on Computer Design.

[9]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[10]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Satish Narayanasamy,et al.  Recording shared memory dependencies using strata , 2006, ASPLOS XII.

[12]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[13]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[14]  Satish Narayanasamy,et al.  Automatic logging of operating system effects to guide application-level architecture simulation , 2006, SIGMETRICS '06/Performance '06.

[15]  Min Xu ReTrace : Collecting Execution Trace with Virtual Machine Deterministic Replay , 2007 .

[16]  Satish Narayanasamy,et al.  Software Profiling for Deterministic Replay Debugging of User Code , 2006, SoMeT.

[17]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[18]  Seth Copen Goldstein,et al.  Hardware-assisted replay of multiprocessor programs , 1991, PADD '91.

[19]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, 2008 International Symposium on Computer Architecture.

[20]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[21]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, International Symposium on Computer Architecture.

[22]  Min Xu,et al.  A regulated transitive reduction (RTR) for longer memory race recording , 2006, ASPLOS XII.

[23]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[24]  Milo M. K. Martin,et al.  SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[25]  Jong-Deok Choi,et al.  A perturbation-free replay platform for cross-optimized multithreaded applications , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[26]  Samuel T. King,et al.  Debugging Operating Systems with Time-Traveling Virtual Machines (Awarded General Track Best Paper Award!) , 2005, USENIX Annual Technical Conference, General Track.

[27]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[28]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.