Sparse record and replay with controlled scheduling

Modern applications include many sources of nondeterminism, e.g. due to concurrency, signals, and system calls that interact with the external environment. Finding and reproducing bugs in the presence of this nondeterminism has been the subject of much prior work in three main areas: (1) controlled concurrency-testing, where a custom scheduler replaces the OS scheduler to find subtle bugs; (2) record and replay, where sources of nondeterminism are captured and logged so that a failing execution can be replayed for debugging purposes; and (3) dynamic analysis for the detection of data races. We present a dynamic analysis tool for C++ applications, tsan11rec, which brings these strands of work together by integrating controlled concurrency testing and record and replay into the tsan11 framework for C++11 data race detection. Our novel twist on record and replay is a sparse approach, where the sources of nondeterminism to record can be configured per application. We show that our approach is effective at finding subtle concurrency bugs in small applications; is competitive in terms of performance with the state-of-the-art record and replay tool rr on larger applications; succeeds (due to our sparse approach) in replaying the I/O-intensive Zandronum and QuakeSpasm video games, which are out of scope for rr; but (due to limitations of our sparse approach) cannot faithfully replay applications where memory layout nondeterminism significantly affects application behaviour.

[1]  Per Larsen,et al.  Secure and Efficient Application Monitoring and Replication , 2016, USENIX Annual Technical Conference.

[2]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[3]  James C. Hoe,et al.  Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems , 2010, ASPLOS 2010.

[4]  Adam Betts,et al.  Concurrency testing using schedule bounding: an empirical study , 2014, PPoPP '14.

[5]  Daniel Aarno,et al.  Full-System Simulation from Embedded to High-Performance Systems , 2010 .

[6]  Per Larsen,et al.  Taming Parallelism in a Multi-Variant Execution Environment , 2017, EuroSys.

[7]  Mike Hibler,et al.  Abstractions for Practical Virtual Machine Replay , 2016, VEE.

[8]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[9]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[10]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, International Symposium on Computer Architecture.

[12]  Stephen N. Freund,et al.  Adversarial memory for detecting destructive races , 2010, PLDI '10.

[13]  Chris Gottbrath Reverse Debugging with the TotalView Debugger , 2009 .

[14]  Madan Musuvathi,et al.  Iterative context bounding for systematic testing of multithreaded programs , 2007, PLDI '07.

[15]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.

[16]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[17]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[18]  Satish Narayanasamy,et al.  Maple: a coverage-driven testing tool for multithreaded programs , 2012, OOPSLA '12.

[19]  Cristian Cadar,et al.  FreeDA: deploying incompatible stock dynamic analyses in production via multi-version execution , 2018, CF.

[20]  Michael D. Ernst,et al.  Interactive record/replay for web application debugging , 2013, UIST.

[21]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, 2008 International Symposium on Computer Architecture.

[22]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Yan Wang,et al.  DrDebug: Deterministic Replay based Cyclic Debugging with Dynamic Slicing , 2014, CGO '14.

[24]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[25]  Robert O'Callahan,et al.  Engineering Record and Replay for Deployability , 2017, USENIX Annual Technical Conference.

[26]  Tal Garfinkel,et al.  Towards Practical Default-On Multi-Core Record/Replay , 2017, ASPLOS.

[27]  Stephen N. Freund,et al.  The RoadRunner Dynamic Analysis Framework for Concurrent Programs , 2010, PASTE '10.

[28]  Zvonimir Rakamaric,et al.  Delay-bounded scheduling , 2011, POPL '11.

[29]  Koen Koning,et al.  Secure and Efficient Multi-Variant Execution Using Hardware-Assisted Process Virtualization , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[30]  Derek Hower,et al.  Rerun: Exploiting Episodes for Lightweight Memory Race Recording , 2008, 2008 International Symposium on Computer Architecture.

[31]  Junfeng Yang,et al.  Parrot: a practical runtime for deterministic, stable, and reliable threads , 2013, SOSP.

[32]  Scott Shenker,et al.  Replay debugging for distributed applications , 2006 .

[33]  Yuanyuan Zhou,et al.  Triage: diagnosing production run failures at the user's site , 2007, SOSP.

[34]  Peter Sewell,et al.  Mathematizing C++ concurrency , 2011, POPL '11.

[35]  Satish Narayanasamy,et al.  Respec: efficient online multiprocessor replayvia speculation and external determinism , 2010, ASPLOS XV.

[36]  Josep Torrellas,et al.  RelaxReplay: record and replay for relaxed-consistency multiprocessors , 2014, ASPLOS.

[37]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[38]  Patrice Godefroid,et al.  Dynamic partial-order reduction for model checking software , 2005, POPL '05.

[39]  Jeff Huang,et al.  Towards Production-Run Heisenbugs Reproduction on Commercial Hardware , 2017, USENIX Annual Technical Conference.

[40]  Luis Ceze,et al.  Deterministic Process Groups in dOS , 2010, OSDI.

[41]  Jong-Deok Choi,et al.  A perturbation-free replay platform for cross-optimized multithreaded applications , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[42]  Brian Demsky,et al.  CDSchecker: checking concurrent data structures written with C/C++ atomics , 2013, OOPSLA.

[43]  Xuxian Jiang,et al.  Time-Traveling Forensic Analysis of VM-Based High-Interaction Honeypots , 2011, SecureComm.

[44]  Michael Chow,et al.  Eidetic Systems , 2014, OSDI.

[45]  Konstantin Serebryany,et al.  ThreadSanitizer: data race detection in practice , 2009, WBIA '09.

[46]  Brandon Lucia,et al.  DMP: Deterministic Shared-Memory Multiprocessing , 2010, IEEE Micro.

[47]  Wei Wang,et al.  iReplayer: in-situ and identical record-and-replay for multithreaded applications , 2018, PLDI.

[48]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[49]  Satish Narayanasamy,et al.  Chimera: hybrid program analysis for determinism , 2012, PLDI.

[50]  Xiangyu Zhang,et al.  Light: replay via tightly bounded recording , 2015, PLDI.

[51]  Josep Torrellas,et al.  Replay debugging: Leveraging record and replay for program debugging , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[52]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[53]  Xuezheng Liu,et al.  Usenix Association 8th Usenix Symposium on Operating Systems Design and Implementation R2: an Application-level Kernel for Record and Replay , 2022 .

[54]  Jeff Huang,et al.  CLAP: recording local executions to reproduce concurrency failures , 2013, PLDI.

[55]  WangChao,et al.  Dynamic partial order reduction for relaxed memory models , 2015 .

[56]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2010, Commun. ACM.

[57]  Cristian Cadar,et al.  VARAN the Unbelievable: An Efficient N-version Execution Framework , 2015, ASPLOS.

[58]  Patrice Godefroid,et al.  Software Model Checking: The VeriSoft Approach , 2005, Formal Methods Syst. Des..

[59]  Yasushi Saito,et al.  Jockey: a user-space library for record-replay debugging , 2005, AADEBUG'05.

[60]  Alastair F. Donaldson,et al.  Dynamic race detection for C++11 , 2017, POPL.

[61]  Jason Nieh,et al.  Transparent, lightweight application execution replay on commodity multiprocessor operating systems , 2010, SIGMETRICS '10.

[62]  Scott Shenker,et al.  Replay Debugging for Distributed Applications (Awarded Best Paper!) , 2006, USENIX Annual Technical Conference, General Track.

[63]  Robert O'Callahan,et al.  Lightweight User-Space Record And Replay , 2016, ArXiv.

[64]  Xiangyu Zhang,et al.  Infrastructure-free logging and replay of concurrent execution on multiple cores , 2014, PPoPP '14.

[65]  Pravesh Kothari,et al.  A randomized scheduler with probabilistic guarantees of finding bugs , 2010, ASPLOS XV.

[66]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[67]  Sen Hu,et al.  Efficient system-enforced deterministic parallelism , 2010, OSDI.

[68]  Junfeng Yang,et al.  Efficient deterministic multithreading through schedule relaxation , 2011, SOSP.

[69]  Brendan Dolan-Gavitt,et al.  Repeatable Reverse Engineering with PANDA , 2015, PPREW@ACSAC.

[70]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[71]  Jeff Huang,et al.  LEAP: lightweight deterministic multi-processor replay of concurrent java programs , 2010, SIGSOFT FSE.

[72]  Dan Grossman,et al.  RCDC: a relaxed consistency deterministic computer , 2011, ASPLOS XVI.

[73]  Seth Copen Goldstein,et al.  Hardware-assisted replay of multiprocessor programs , 1991, PADD '91.

[74]  Josep Torrellas,et al.  QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs , 2013, ISCA.