Automatically classifying benign and harmful data races using replay analysis

Many concurrency bugs in multi-threaded programs are due to dataraces. There have been many efforts to develop static and dynamic mechanisms to automatically find the data races. Most of the prior work has focused on finding the data races and eliminating the false positives. In this paper, we instead focus on a dynamic analysis technique to automatically classify the data races into two categories - the dataraces that are potentially benign and the data races that are potentially harmful. A harmful data race is a real bug that needs to be fixed. This classification is needed to focus the triaging effort on those data races that are potentially harmful. Without prioritizing the data races we have found that there are too many data races to triage. Our second focus is to automatically provide to the developer a reproducible scenario of the data race, which allows the developer to understand the different effects of a harmful data race on a program's execution. To achieve the above, we record a multi-threaded program's execution in a replay log. The replay log is used to replay the multi-threaded program, and during replay we find the data races using a happens-before based algorithm. To automatically classify if a data race that we find is potentially benign or potentially harmful, were play the execution twice for a given data race - one for each possible order between the conflicting memory operations. If the two replays for the two orders produce the same result, then we classify the data race to be potentially benign. We discuss our experiences in using our replay based dynamic data race checker on several Microsoft applications.

[1]  Yuanyuan Zhou,et al.  AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants , 2007, IEEE Micro.

[2]  Jeffrey S. Foster,et al.  LOCKSMITH: context-sensitive correlation analysis for race detection , 2006, PLDI '06.

[3]  Jong-Deok Choi,et al.  Techniques for debugging parallel programs with flowback analysis , 1991, TOPL.

[4]  Min Xu,et al.  A serializability violation detector for shared-memory server programs , 2005, PLDI '05.

[5]  Thomas R. Gross,et al.  Object race detection , 2001, OOPSLA '01.

[6]  Nicholas Sterling,et al.  WARLOCK - A Static Data Race Analysis Tool , 1993, USENIX Winter.

[7]  Thomas A. Henzinger,et al.  Race checking by context inference , 2004, PLDI '04.

[8]  Martin C. Rinard,et al.  ACM Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), November 2002 Ownership Types for Safe Programming: Preventing Data Races and Deadlocks , 2022 .

[9]  Edith Schonberg,et al.  Detecting access anomalies in programs with critical sections , 1991, PADD '91.

[10]  Edith Schonberg,et al.  An empirical comparison of monitoring algorithms for access anomaly detection , 2011, PPOPP '90.

[11]  John M. Mellor-Crummey,et al.  On-the-fly detection of data races for programs with nested fork-join parallelism , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[12]  Koen De Bosschere,et al.  Non-intrusive on-the-fly data race detection using execution replay , 2000, AADEBUG.

[13]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[14]  Assaf Schuster,et al.  Efficient on-the-fly data race detection in multithreaded C++ programs , 2003, PPoPP '03.

[15]  Dinghao Wu,et al.  KISS: keep it simple and sequential , 2004, PLDI '04.

[16]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[17]  Robert H. B. Netzer Optimal tracing and replay for debugging shared-memory parallel programs , 1993, PADD '93.

[18]  Rahul Agarwal,et al.  Automated type-based analysis of data races and atomicity , 2005, PPoPP.

[19]  Peter J. Keleher,et al.  Online data-race detection via coherency guarantees , 1996, OSDI '96.

[20]  Edith Schonberg,et al.  On-the-fly detection of access anomalies , 2018, PLDI '89.

[21]  Hiroyasu Nishiyama,et al.  Detecting Data Races Using Dynamic Escape Analysis Based on Read Barrier , 2004, Virtual Machine Research and Technology Symposium.

[22]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[23]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[24]  James R. Larus,et al.  Protocol-based data-race detection , 1998, SPDT '98.

[25]  Barton P. Miller,et al.  Detecting Data Races on Weak Memory Systems , 1991, ISCA.

[26]  Koen De Bosschere,et al.  RecPlay: a fully integrated practical record/replay system , 1999, TOCS.

[27]  Jong-Deok Choi,et al.  Race Frontier: reproducing data races in parallel-program debugging , 1991, PPOPP '91.

[28]  Jong-Deok Choi,et al.  Hybrid dynamic data race detection , 2003, PPoPP '03.

[29]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[30]  Jong-Deok Choi,et al.  An efficient cache-based access anomaly detection scheme , 1991, ASPLOS IV.

[31]  Serdar Tasiran,et al.  VYRD: verifYing concurrent programs by runtime refinement-violation detection , 2005, PLDI '05.

[32]  Rahul Agarwal,et al.  Optimized run-time race detection and atomicity checking using partial discovered types , 2005, ASE.

[33]  Josep Torrellas,et al.  ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes , 2003, ISCA '03.

[34]  Mark Moir,et al.  Lock-free reference counting , 2001, PODC '01.

[35]  Jong-Deok Choi,et al.  Efficient and precise datarace detection for multithreaded object-oriented programs , 2002, PLDI '02.

[36]  Stephen N. Freund,et al.  Type-based race detection for Java , 2000, PLDI '00.