IMITATOR: A deterministic multicore replay system with refining techniques

Developing parallel programs imposes many debugging challenges on multicore systems. Many researchers were successful to detect parallel faults in background by hardware assistance. However, it is still an urgent issue to reproduce the same faulted circumstance after faults occurred. Tracing the causality between events is a popular solution in current multicore systems, but it is limited by onchip storage and tracing bandwidth. As a result, an intelligent record and replay system is the key to the future multicore debugging problems. This paper proposes IMITATOR for both trace compression and deterministic replay. In contrast to the most other record and replay systems, IMITATOR presents an additional phase, refining phase, between record and replay phases to significantly reduce the recorder overhead, while enabling faster replaying. Results with SPLASH2 benchmark on a 32-core system show that IMITATOR can (a) significantly reduce trace size by the trace refining techniques (~16% of native trace) and (b) achieve replay speed 1.96 times faster than the replayer using Sigrace scheme on average.

[1]  Min Xu,et al.  A regulated transitive reduction (RTR) for longer memory race recording , 2006, ASPLOS XII.

[2]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[3]  Ali-Reza Adl-Tabatabai,et al.  Architecting a chunk-based memory race recorder in Modern CMPs , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Mukesh Singhal,et al.  Logical Time: Capturing Causality in Distributed Systems , 1996, Computer.

[5]  Josep Torrellas,et al.  SigRace: signature-based data race detection , 2009, ISCA '09.

[6]  Derek Hower,et al.  Rerun: Exploiting Episodes for Lightweight Memory Race Recording , 2008, 2008 International Symposium on Computer Architecture.