An analytic study of dynamic hardware and software cache coherence strategies

Dynamic software cache coherence strategies use information about program sharing behaviour to manage caches at run-time and at a granularity defined by the application. The program-level information is obtained through annotations placed into the application by the user or the compiler. The coherence protocols may range from simple static algorithms to dynamic algorithms that use run-time data structures similar to the directories used in hardware strategies. In this paper, we present an analytic study of five dynamic software cache coherence algorithms and compare these to a representative hardware coherence strategy. The analytic model is constructed using four input parameters --- write probability, locality, granularity, and system size --- and solved by analysis of a Markov chain. We show that the fundamental tradeoffs between the different hardware and software strategies are captured in this model. The results of the study show that hardware schemes perform better for fine-grained data structures for much of the parameter space that we study. However, for coarse-grained data structures, various software algorithms are dominant over most of the parameter space. Further, hardware strategies are found to be more susceptible to the effects of contention, and also perform worse for the asymmetric workload that we study.

[1]  Alexander V. Veidenbaum,et al.  Compiler-directed cache management in multiprocessors , 1990, Computer.

[2]  Monica S. Lam,et al.  Coarse-grain parallel programming in Jade , 1991, PPOPP '91.

[3]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[4]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[5]  Anoop Gupta,et al.  Performance evaluation of hybrid hardware and software distributed shared memory protocols , 1994, ICS '94.

[6]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[7]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[8]  Anant Agarwal,et al.  Evaluating the performance of software cache coherence , 1989, ASPLOS III.

[9]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[10]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[11]  Kevin P. McAuliffe,et al.  Automatic Management of Programmable Caches , 1988, ICPP.

[12]  Anoop Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.

[13]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[14]  Mary K. Vernon,et al.  Comparison of hardware and software cache coherence schemes , 1991, ISCA '91.

[15]  James R. Larus,et al.  Cooperative Shared Memory: Software and Hardware Support for Scalable Multiprocesors , 1992, International Conference on Architectural Support for Programming Languages and Operating Systems.

[16]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessor , 1992, ASPLOS V.

[17]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[18]  Anant Agarwal,et al.  Evaluating the performance of software cache coherence , 1989, ASPLOS 1989.

[19]  Michael Stumm,et al.  Hector: a hierarchically structured shared-memory multiprocessor , 1991, Computer.

[20]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[21]  Henry M. Levy,et al.  Distributed shared memory with versioned objects , 1992, OOPSLA.

[22]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[23]  Harjinder S. Sandhu,et al.  The shared regions approach to software cache coherence on multiprocessors , 1993, PPOPP '93.