Automatic detection of extended data-race-free regions

Data-race-free (DRF) parallel programming becomes a standard as newly adopted memory models of mainstream programming languages such as C++ or Java impose data-race-freedom as a requirement. We propose compiler techniques that automatically delineate extended data-race-free regions (xDRF), namely regions of code which provide the same guarantees as the synchronization-free regions (in the context of DRF codes). xDRF regions stretch across synchronization boundaries, function calls and loop back-edges and preserve the data-race-free semantics, thus increasing the optimization opportunities exposed to the compiler and to the underlying architecture. Our compiler techniques precisely analyze the threads' memory accessing behavior and data sharing in shared-memory, general-purpose parallel applications and can therefore infer the limits of xDRF code regions. We evaluate the potential of our technique by employing the xDRF region classification in a state-of-the-art, dualmode cache coherence protocol. Larger xDRF regions reduce the coherence bookkeeping and enable optimizations for performance (6.8%) and energy efficiency (11.7%) compared to a standard directory-based coherence protocol.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[3]  Stefanos Kaxiras,et al.  Splash-3: A properly synchronized benchmark suite for contemporary research , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[4]  Jingling Xue,et al.  Acculock: Accurate and efficient detection of data races , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[5]  Erik Hagersten A Dual-Consistency Cache Coherence Protocol , 2015 .

[6]  Sarita V. Adve,et al.  DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations , 2015, ASPLOS.

[7]  Mark D. Hill,et al.  Weak ordering—a new definition , 1998, ISCA '98.

[8]  Sarita V. Adve,et al.  DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[9]  Hans-Juergen Boehm,et al.  A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code , 2011, POPL '11.

[10]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[11]  Stefanos Kaxiras,et al.  Fast&Furious: A Tool for Detecting Covert Racing , 2015, PARMA-DITAM '15.

[12]  Alberto Ros,et al.  A Hybrid Static-Dynamic Classification for Dual-Consistency Cache Coherence , 2016, IEEE Transactions on Parallel and Distributed Systems.

[13]  Brandon Lucia,et al.  Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races , 2010, ISCA.

[14]  Dan Grossman,et al.  IFRit: interference-free regions for dynamic data-race detection , 2012, OOPSLA '12.

[15]  Thomas J. Ashby,et al.  Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters , 2011, IEEE Transactions on Computers.

[16]  Grigore Rosu,et al.  Maximal sound predictive race detection with control flow abstraction , 2014, PLDI.

[17]  Stefanos Kaxiras,et al.  Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics , 2016, IEEE Trans. Parallel Distributed Syst..

[18]  Stefanos Kaxiras,et al.  A new perspective for efficient virtual-cache coherence , 2013, ISCA.

[19]  Martín Abadi,et al.  Types for safe locking: Static race detection for Java , 2006, TOPL.

[20]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[21]  Swarnendu Biswas,et al.  Valor: efficient, software-only region conflict exceptions , 2015, OOPSLA.

[22]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Rami G. Melhem,et al.  Compiler-assisted data distribution for chip multiprocessors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[24]  Jingling Xue,et al.  Sparse flow-sensitive pointer analysis for multithreaded programs , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[25]  Stefanos Kaxiras,et al.  Complexity-effective multicore coherence , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[27]  Vijay Nagarajan,et al.  RC3: Consistency Directed Cache Coherence for x86-64 with RC Extensions , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[28]  Hans-Juergen Boehm,et al.  Extended sequential reasoning for data-race-free programs , 2011, MSPC '11.

[29]  Rami G. Melhem,et al.  Practically Private: Enabling high performance CMPs through compiler-assisted data classification , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[30]  Jung Ho Ahn,et al.  How to simulate 1000 cores , 2009, CARN.

[31]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[32]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[33]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[34]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[35]  Sarita V. Adve,et al.  DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.

[36]  Satish Narayanasamy,et al.  End-to-end sequential consistency , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).