OCTET: Practical Concurrency Control for Dynamic Analyses and Systems

Parallel programming is essential for reaping the benefits of parallel hardware, but it is notoriously difficult to develop and debug reliable, scalable software systems. One key challenge is that modern languages and systems provide poor support for ensuring concurrency correctness properties—such as atomicity, sequential consistency, and multithreaded determinism—because all existing approaches are impractical. Dynamic, software-based approaches slow programs by up to an order of magnitude because capturing cross-thread dependences (i.e., conflicting accesses) requires synchronization at every access to potentially shared memory. This paper introduces a new software-based concurrency control mechanism called OCTET that captures cross-thread dependences soundly but avoids synchronization at non-conflicting accesses. OCTET tracks the locality state of each potentially shared object. Non-conflicting accesses conform to the locality state and require no synchronization, but conflicting accesses require a state change with heavyweight synchronization. This optimistic tradeoff performs well for real-world concurrent programs, which by design execute relatively few conflicting accesses. We have implemented a prototype of OCTET in a high-performance Java virtual machine. Our evaluation demonstrates OCTET’s potential for capturing cross-thread dependences with overhead low enough for production systems. OCTET is an appealing and practical concurrency control mechanism for designing low-overhead, sound and precise analyses and systems that check and enforce concurrency correctness properties.

[1]  Satish Narayanasamy,et al.  Chimera: hybrid program analysis for determinism , 2012, PLDI.

[2]  Marek Olszewski,et al.  Aikido: accelerating shared data dynamic analyses , 2012, ASPLOS XVII.

[3]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[4]  Junfeng Yang,et al.  Efficient deterministic multithreading through schedule relaxation , 2011, SOSP.

[5]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[6]  Satish Narayanasamy,et al.  Efficient processor support for DRFx, a memory model with exceptions , 2011, ASPLOS XVI.

[7]  Wenguang Chen,et al.  Do I use the wrong definition?: DeFuse: definition-use invariants for detecting concurrency and sequential bugs , 2010, OOPSLA.

[8]  Shan Lu,et al.  Instrumentation and sampling strategies for cooperative concurrency bug isolation , 2010, OOPSLA.

[9]  Brandon Lucia,et al.  Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races , 2010, ISCA.

[10]  Michael D. Bond,et al.  PACER: proportional detection of data races , 2010, PLDI '10.

[11]  Satish Narayanasamy,et al.  DRFX: a simple and efficient memory model for concurrent programming languages , 2010, PLDI '10.

[12]  Satish Narayanasamy,et al.  Respec: efficient online multiprocessor replayvia speculation and external determinism , 2010, ASPLOS XV.

[13]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.

[14]  Babak Falsafi,et al.  ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications , 2010, ASPLOS XV.

[15]  Xiangyu Zhang,et al.  Analyzing multicore dumps to facilitate concurrency bug reproduction , 2010, ASPLOS XV.

[16]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[17]  Sarita V. Adve,et al.  Memory models: a case for rethinking parallel languages and hardware , 2009, PODC '09.

[18]  Satish Narayanasamy,et al.  LiteRace: effective sampling for lightweight data-race detection , 2009, PLDI '09.

[19]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[20]  Josep Torrellas,et al.  Two hardware-based approaches for deterministic multiprocessor replay , 2009, CACM.

[21]  Sarita V. Adve,et al.  Parallel programming must be deterministic by default , 2009 .

[22]  Brandon Lucia,et al.  A case for system support for concurrency exceptions , 2009 .

[23]  K. Rustan M. Leino,et al.  A Basis for Verifying Multi-threaded Programs , 2009, ESOP.

[24]  Brandon Lucia,et al.  DMP: deterministic shared memory multiprocessing , 2009, IEEE Micro.

[25]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[26]  Maged M. Michael,et al.  Software Transactional Memory: Why Is It Only a Research Toy? , 2008, ACM Queue.

[27]  Stephen N. Freund,et al.  Velodrome: a sound and complete dynamic atomicity checker for multithreaded programs , 2008, PLDI '08.

[28]  Kathryn S. McKinley,et al.  Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance , 2008, PLDI '08.

[29]  Hsien-Hsin S. Lee,et al.  Kicking the tires of software transactional memory: why the going gets tough , 2008, SPAA '08.

[30]  Craig B. Zilles,et al.  Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory , 2008, 2008 International Symposium on Computer Architecture.

[31]  Madan Musuvathi,et al.  Iterative context bounding for systematic testing of multithreaded programs , 2007, PLDI '07.

[32]  Dan Grossman,et al.  Enforcing isolation and ordering in STM , 2007, PLDI '07.

[33]  Serdar Tasiran,et al.  Goldilocks: a race and transaction-aware java runtime , 2007, PLDI '07.

[34]  Alexander Aiken,et al.  Conditional must not aliasing for static race detection , 2007, POPL '07.

[35]  Yuanyuan Zhou,et al.  AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants , 2007, IEEE Micro.

[36]  Satish Narayanasamy,et al.  Recording shared memory dependencies using strata , 2006, ASPLOS XII.

[37]  Benjamin Hindman,et al.  Atomicity via source-to-source translation , 2006, MSPC '06.

[38]  David Detlefs,et al.  Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing , 2006, OOPSLA '06.

[39]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[40]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[41]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[42]  Vivek Sarkar,et al.  The Jikes Research Virtual Machine project: Building an open-source research community , 2005, IBM Syst. J..

[43]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[44]  Stephen M. Blackburn,et al.  Barriers: friend or foe? , 2004, ISMM '04.

[45]  Stephen N. Freund,et al.  Type inference against races , 2004, Sci. Comput. Program..

[46]  Kiyokuni Kawachiya,et al.  Lock Reservation for Java Reconsidered , 2004, ECOOP.

[47]  Stephen N. Freund,et al.  Atomizer: a dynamic atomicity checker for multithreaded programs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[48]  R. Bodík,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[49]  Thomas R. Gross,et al.  Static conflict analysis for multi-threaded object-oriented programs , 2003, PLDI '03.

[50]  Kiyokuni Kawachiya,et al.  Lock reservation: Java locks can mostly do without atomic operations , 2002, OOPSLA '02.

[51]  Martin C. Rinard,et al.  ACM Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), November 2002 Ownership Types for Safe Programming: Preventing Data Races and Deadlocks , 2022 .

[52]  Jong-Deok Choi,et al.  Efficient and precise datarace detection for multithreaded object-oriented programs , 2002, PLDI '02.

[53]  Thomas R. Gross,et al.  Object race detection , 2001, OOPSLA '01.

[54]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[55]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[56]  Michael L. Scott,et al.  Software cache coherence for large scale multiprocessors , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[57]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[58]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[59]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[60]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[61]  Henri E. Bal,et al.  Orca: A Language For Parallel Programming of Distributed Systems , 1992, IEEE Trans. Software Eng..

[62]  Kourosh Gharachorloo,et al.  Detecting violations of sequential consistency , 1991, SPAA '91.

[63]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[64]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[65]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[66]  G. Glauberman Proof of Theorem A , 1977 .

[67]  Scott D. Stoller,et al.  Runtime analysis of atomicity for multithreaded programs , 2006, IEEE Transactions on Software Engineering.

[68]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[69]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.

[70]  Janak H. Patel,et al.  A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.