Samsara: Efficient Deterministic Replay with Hardware Virtualization Extensions

Deterministic replay, which provides the ability to travel backward in time and reconstructs the past execution flow of a multi-processor system, has many prominent applications including cyclic debugging, intrusion detection, malware analysis, and fault tolerance. Previous software-only schemes cannot take advantage of modern hardware support for replay and suffer from excessive performance overhead. They also produce huge log sizes due to the inherent draw-backs of the point-to-point logging approach used. In this paper, we propose a novel approach, called Samsara, which uses hardware-assisted virtualization (HAV) extensions to achieve an efficient software-based replay system. Unlike previous software-only schemes that record dependences between individual instructions, we record processors' execution as a series of chunks. By leveraging HAV extensions, we avoid the large number of memory access detections which are a major source of overhead in the previous work and instead perform a single extended page table (EPT) traversal at the end of each chunk. We have implemented and evaluated our system on KVM with Intel's Haswell processor. Evaluation results show that our system incurs less than 3X overhead when compared to native execution with two processors while the overhead in other state-of-the-art work is much more than 10X. Our system improves recording performance dramatically with a log size even smaller than that in hardware-based scheme.

[1]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, International Symposium on Computer Architecture.

[2]  Haibo Chen,et al.  Scalable deterministic replay in a parallel full-system emulator , 2013, PPoPP '13.

[3]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[4]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[5]  Min Xu ReTrace : Collecting Execution Trace with Virtual Machine Deterministic Replay , 2007 .

[6]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[7]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[8]  Jun Zhu,et al.  Twinkle: A fast resource provisioning mechanism for internet services , 2011, 2011 Proceedings IEEE INFOCOM.

[9]  Satish Narayanasamy,et al.  Respec: efficient online multiprocessor replayvia speculation and external determinism , 2010, ASPLOS XV.

[10]  Josep Torrellas,et al.  RelaxReplay: record and replay for relaxed-consistency multiprocessors , 2014, ASPLOS.

[11]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[12]  R. Bodík,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[13]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[14]  Jun Zhu,et al.  Optimizing the Performance of Virtual Machine Synchronization for Fault Tolerance , 2011, IEEE Transactions on Computers.

[15]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, 2008 International Symposium on Computer Architecture.

[16]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[17]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[18]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[19]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[20]  Satish Narayanasamy,et al.  Recording shared memory dependencies using strata , 2006, ASPLOS XII.

[21]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.