Samsara: Efficient Deterministic Replay in Multiprocessor Environments with Hardware Virtualization Extensions

Deterministic replay, which provides the ability to travel backward in time and reconstruct the past execution flow of a multiprocessor system, has many prominent applications. Prior research in this area can be classified into two categories: hardware-only schemes and software-only schemes. While hardware-only schemes deliver high performance, they require significant modifications to the existing hardware which makes them difficult to deploy in real systems. In contrast, software-only schemes work on commodity hardware, but suffer from excessive performance overhead and huge logs caused by tracing every single memory access in the software layer. In this paper, we present the design and implementation of a novel system, Samsara, which uses the hardware-assisted virtualization (HAV) extensions to achieve efficient and practical deterministic replay without requiring any hardware modification. Unlike prior software schemes which trace every single memory access to record interleaving, Samsara leverages the HAV extensions on commodity processors to track the read-set and write-set for implementing a chunk-based recording scheme in software. By doing so, we avoid all memory access detections, which is a major source of overhead in prior works. We implement and evaluate our system in KVM on commodity Intel Haswell processor. Evaluation results show that compared with prior software-only schemes, Samsara significantly reduces the log file size to 1/70th on average, and further reduces the recording overhead from about 10×, reported by state-of-the-art works, to 2.3× on average.

[1]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[2]  Satish Narayanasamy,et al.  Recording shared memory dependencies using strata , 2006, ASPLOS XII.

[3]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[4]  Mark D. Hill,et al.  Karma: scalable deterministic record-replay , 2011, ICS '11.

[5]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[6]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[7]  Tianshi Chen,et al.  LReplay: a pending period based deterministic replay scheme , 2010, ISCA.

[8]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[9]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[10]  Samuel T. King,et al.  Detecting past and present intrusions through vulnerability-specific predicates , 2005, SOSP '05.

[11]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[12]  Min Xu ReTrace : Collecting Execution Trace with Virtual Machine Deterministic Replay , 2007 .

[13]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[14]  Depei Qian,et al.  Pacifier: Record and replay for relaxed-consistency multiprocessors with distributed directory protocol , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[15]  Eugene H. Spafford,et al.  An execution-backtracking approach to debugging , 1991, IEEE Software.

[16]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[17]  Samuel T. King,et al.  Backtracking intrusions , 2003, SOSP '03.

[18]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[19]  Jun Zhu,et al.  Optimizing the Performance of Virtual Machine Synchronization for Fault Tolerance , 2011, IEEE Transactions on Computers.

[20]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[21]  Depei Qian,et al.  Rainbow: Efficient memory dependence recording with high replay parallelism for relaxed memory model , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[22]  Ganesh Venkitachalam,et al.  The design of a practical system for fault-tolerant virtual machines , 2010, OPSR.

[23]  Haibo Chen,et al.  ORDER: Object centRic DEterministic Replay for Java , 2011, USENIX Annual Technical Conference.

[24]  Andreas Haeberlen,et al.  Detecting Covert Timing Channels with Time-Deterministic Replay , 2014, OSDI.

[25]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[26]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[27]  Jun Zhu,et al.  Twinkle: A fast resource provisioning mechanism for internet services , 2011, 2011 Proceedings IEEE INFOCOM.

[28]  Jian Xu,et al.  Adaptive message logging for incremental program replay , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[29]  Haibo Chen,et al.  Scalable deterministic replay in a parallel full-system emulator , 2013, PPoPP '13.

[30]  Satish Narayanasamy,et al.  Respec: efficient online multiprocessor replayvia speculation and external determinism , 2010, ASPLOS XV.

[31]  Josep Torrellas,et al.  RelaxReplay: record and replay for relaxed-consistency multiprocessors , 2014, ASPLOS.

[32]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[33]  R. Bodík,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[34]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, 2008 International Symposium on Computer Architecture.

[35]  Frank Mueller,et al.  Elastic and scalable tracing and accurate replay of non-deterministic events , 2013, ICS '13.

[36]  Derek Hower,et al.  Rerun: Exploiting Episodes for Lightweight Memory Race Recording , 2008, 2008 International Symposium on Computer Architecture.

[37]  T. N. Vijaykumar,et al.  Timetraveler: exploiting acyclic races for optimizing memory race recording , 2010, ISCA.

[38]  Ali-Reza Adl-Tabatabai,et al.  Architecting a chunk-based memory race recorder in Modern CMPs , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39]  Satish Narayanasamy,et al.  BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging , 2005, ISCA 2005.

[40]  Josep Torrellas,et al.  Cyrus: unintrusive application-level record-replay for replay parallelism , 2013, ASPLOS '13.

[41]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, International Symposium on Computer Architecture.

[42]  Min Xu,et al.  A regulated transitive reduction (RTR) for longer memory race recording , 2006, ASPLOS XII.

[43]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[44]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[45]  Zhen Xiao,et al.  Samsara: Efficient Deterministic Replay with Hardware Virtualization Extensions , 2015, APSys.

[46]  Michael Chow,et al.  Eidetic Systems , 2014, OSDI.