VISU: A Simple and Efficient Cache Coherence Protocol Based on Self-updating

Existing cache coherence protocols incur high overheads to shared memory systems and significantly reduce the system efficiency. For example, the widely used snooping protocol broadcasts messages at the expense of high network bandwidth overheads, and the directory protocol requires massive storage spaces to keep track of sharers. Furthermore, these coherence protocols have numerous transient states to cover various races, which increase the difficulty of implementation and verification. To mitigate these issues, this paper proposes a simple and efficient, two-state (Valid and Invalid) cache coherence protocol, VISU, for data-race-free programs. We adopt two distinct schemes for the private and shared data to simplify the design. Since the private data does not need to maintain coherence, we apply a simple write-back policy. For shared data, we leverage a write-through policy to make the last-level cache always hold the up-to-date data. A self-updating mechanism is deployed at synchronization points to update stale copies in L1 caches; this obviates the need for the broadcast communication or the directory.

[1]  Sarita V. Adve,et al.  DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.

[2]  Mark D. Hill,et al.  Weak ordering—a new definition , 1998, ISCA '98.

[3]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[4]  Michael C. Huang,et al.  POPS: Coherence Protocol Optimization for Both Private and Shared Data , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[5]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[6]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[7]  Jaehyuk Huh,et al.  Subspace snooping: Filtering snoops with operating system support , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Stefanos Kaxiras,et al.  SARC Coherence: Scaling Directory Cache Coherence in Performance and Power , 2010, IEEE Micro.

[9]  Sarita V. Adve,et al.  DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[10]  Norman P. Jouppi,et al.  Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , 2008, IEEE Micro.

[11]  Antonio Robles,et al.  Increasing the Effectiveness of Directory Caches by Avoiding the Tracking of Noncoherent Memory Blocks , 2013, IEEE Transactions on Computers.

[12]  Stefanos Kaxiras,et al.  Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics , 2016, IEEE Trans. Parallel Distributed Syst..

[13]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[14]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[15]  Stefanos Kaxiras,et al.  Complexity-effective multicore coherence , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[17]  Sarita V. Adve,et al.  DeNovoND: Efficient Hardware for Disciplined Nondeterminism , 2014, IEEE Micro.

[18]  Laxmi N. Bhuyan,et al.  A Formal Specification and Verification Technique for Cache Coherence Protocols , 1992, ICPP.