Race condition detection for debugging of shared-memoryparallel programs

This thesis addresses theoretical and practical aspects of the dynamic detecting and debugging of race conditions in shared-memory parallel programs. To reason about race conditions, we present a formal model that characterizes actual, observed, and potential behaviors of the program. The actual behavior precisely represents the program execution, the observed behavior represents partial information that can be reasonably recorded, and the potential behavior represents alternate executions possibly allowed by nondeterministic timing variations. These behaviors are used to characterize different types of race conditions, general races and data races, which pertain to different classes of parallel programs and require different detection techniques. General races apply to programs intended to be deterministic; data races apply to nondeterministic programs containing critical sections. We prove that, for executions using synchronization powerful enough to implement two-process mutual exclusion, locating every general race or data race is an NP-hard problem. However, for data races, we show that detecting only a subset of all races is sufficient for debugging. We also prove that, for weaker types of synchronization, races can be efficiently located. We present post-mortem algorithms for detecting race conditions as accurately as possible, given the constraint of limited run-time information. We characterize those races that are direct manifestations of bugs and not artifacts caused by other races, imprecise run-time traces (causing false races to appear real), or unintentional synchronization (caused by shared-memory references). Our techniques analyze the observed behavior to conservatively locate races that either did occur or had the potential of occurring, and that were unaffected by any other race in the execution. Finally, we describe a prototype data race detector that we used to analyze a sample collection of parallel programs. Experiments indicate that our techniques effectively pinpoint non-artifact races, directing the programmer to parts of the execution containing direct manifestations of bugs. In all programs analyzed, our techniques reduced hundreds to thousands of races down to four or fewer that required investigation.

[1]  Richard N. Taylor Static analysis of the synchronization structure of concurrent programs , 1980 .

[2]  Richard N. Taylor,et al.  Combining Static Concurrency Analysis with Symbolic Execution , 1988, IEEE Trans. Software Eng..

[3]  Jong-Deok Choi,et al.  Techniques for debugging parallel programs with flowback analysis , 1991, TOPL.

[4]  Ken Kennedy,et al.  Interactive parallelization of numerical scientific programs , 1989 .

[5]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[6]  David A. Padua,et al.  Event synchronization analysis for debugging parallel programs , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[7]  Charles E. McDowell,et al.  Analyzing Traces with Anonymous Synchronization , 1989, ICPP.

[8]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[9]  Charles E. McDowell,et al.  Computing reachable states of parallel programs , 1991, PADD '91.

[10]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[11]  Edith Schonberg,et al.  Detecting access anomalies in programs with critical sections , 1991, PADD '91.

[12]  Rami R. Razouk,et al.  Interactive State-Space Analysis of Concurrent Systems , 1987, IEEE Transactions on Software Engineering.

[13]  William E. Riddle,et al.  Anomaly Detection in Concurrent Programs , 1979, ICSE.

[14]  Otthein Herzog Static Analysis of Concurrent Processes for Dynamic Properties Using Petri Nets , 1979, Semantics of Concurrent Computation.

[15]  Barton P. Miller,et al.  Detecting Data Races on Weak Memory Systems , 1991, ISCA.

[16]  Richard H. Carver,et al.  Reproducible Testing of Concurrent Programs Based on Shared Variables , 1986, ICDCS.

[17]  Ken Kennedy,et al.  Parallel program debugging with on-the-fly anomaly detection , 1990, Proceedings SUPERCOMPUTING '90.

[18]  Arthur J. Bernstein,et al.  Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..

[19]  Mark A. Linton,et al.  Supporting reverse execution for parallel programs , 1988, PADD '88.

[20]  Sol M. Shatz,et al.  APPROACH TO AUTOMATED STATIC ANALYSIS OF DISTRIBUTED SOFTWARE. , 1985 .

[21]  Barbara G. Ryder,et al.  Directed Tracing to Detect Race Conditions , 1992, ICPP.

[22]  CallahanDavid,et al.  Static analysis of low-level synchronization , 1988 .

[23]  Richard N. Taylor,et al.  A facility for verification, testing and documentation of concurrent process software , 1978, COMPSAC.

[24]  Jong-Deok Choi,et al.  An efficient cache-based access anomaly detection scheme , 1991, ASPLOS IV.

[25]  Thomas J. LeBlanc,et al.  Debugging and analysis of large-scale parallel programs , 1989 .

[26]  Edith Schonberg,et al.  On-the-fly detection of access anomalies , 2018, PLDI '89.

[27]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[28]  Barton P. Miller,et al.  Detecting Data Races in Parallel Program Executions , 1989 .

[29]  Richard N. Taylor,et al.  Anomaly Detection in Concurrent Software by Static Data Flow Analysis , 1980, IEEE Transactions on Software Engineering.

[30]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[31]  Charles E. McDowell,et al.  Integrating tools for debugging and developing multitasking programs , 1988, PADD '88.

[32]  B. F. Spencer Validation of Results , 1986 .

[33]  Edith Schonberg,et al.  An empirical comparison of monitoring algorithms for access anomaly detection , 2011, PPOPP '90.

[34]  John H. Reif,et al.  Data flow analysis of communicating processes , 1979, POPL.

[35]  Richard N. Taylor,et al.  A general-purpose algorithm for analyzing concurrent programs , 1983, CACM.

[36]  A. Habermann Synchronization of communicating processes , 1972, CACM.

[37]  Barton P. Miller,et al.  On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions , 1990, ICPP.

[38]  Ken Kennedy,et al.  Compile-time detection of race conditions in a parallel program , 1989, ICS '89.

[39]  Ken Kennedy,et al.  A technique for summarizing data access and its use in parallelism enhancing transformations , 1989, PLDI '89.

[40]  David A. Padua,et al.  Automatic detection of nondeterminacy in parallel programs , 1988, PADD '88.

[41]  Willy Zwaenepoel,et al.  Causal distributed breakpoints , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[42]  Leon J. Osterweil,et al.  Integrating the testing, analysis and debugging of programs , 1984 .

[43]  Colin J. Fidge,et al.  Partial orders for parallel debugging , 1988, PADD '88.

[44]  Larry Rudolph,et al.  Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[45]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[46]  W. Weigel,et al.  Global events and global breakpoints in distributed systems , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[47]  Jong-Deok Choi,et al.  A mechanism for efficient debugging of parallel programs , 1988, PADD '88.

[48]  Leon J. Osterweil,et al.  Cecil: A Sequencing Constraint Language for Automatic Static Analysis Generation , 1990, IEEE Trans. Software Eng..

[49]  Barton P. Miller,et al.  Improving the accuracy of data race detection , 1991, PPOPP '91.

[50]  Victor Jon Griswold,et al.  Core algorithms for autonomous monitoring of distributed systems , 1991, PADD '91.

[51]  Victor Jon Griswold Determining Interior Vertices of Graph Intervals , 1990 .

[52]  Jaspal Subhlok,et al.  Static analysis of low-level synchronization , 1988, PADD '88.

[53]  Ken Kennedy,et al.  Analysis of event synchronization in a parallel programming tool , 1990, PPOPP '90.

[54]  Richard N. Taylor Analysis of concurrent software by cooperative application of static and dynamic techniques , 1984 .

[55]  Leslie Lamport,et al.  The mutual exclusion problem: part I—a theory of interprocess communication , 1986, JACM.

[56]  Arnold L. Rosenberg,et al.  The significance of program dependences for software testing, debugging, and maintenance , 1989 .

[57]  Lori A. Clarke,et al.  A Formal Model of Program Dependences and Its Implications for Software Testing, Debugging, and Maintenance , 1990, IEEE Trans. Software Eng..

[58]  James R. Larus,et al.  Abstract execution: A technique for efficiently tracing programs , 1990, Softw. Pract. Exp..

[59]  Jong-Deok Choi,et al.  Race Frontier: reproducing data races in parallel-program debugging , 1991, PPOPP '91.

[60]  David Padua,et al.  Debugging Fortran on a shared memory machine , 1987 .

[61]  Edsger W. Dijkstra,et al.  Solution of a problem in concurrent programming control , 1965, CACM.

[62]  Guy L. Steele,et al.  Making asynchronous parallelism safe for the world , 1989, POPL '90.

[63]  John M. Mellor-Crummey,et al.  On-the-fly detection of data races for programs with nested fork-join parallelism , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[64]  Krzysztof R. Apt,et al.  A Static Analysis of CSP Programs , 1983, Logic of Programs.