Comparison of hardware and software cache coherence schemes

We use mean value analysis models to compare representative hardware and software cache coherence schemes for a large-scale shared-memory system. Our goal is to identify the workloads for which either of the schemes is significantly better. Our methodology improves upon previous analytical studies and complements previous simulation studies by developing a common high-level workload model that is used to derive separate sets of lowlevel workload parameters for the two schemes. This approach allows an equitable comparison of the two schemes for a specific workload. is attractive because the overhead of detecting stale data is transferred from runtime to compile time, and the design complexity is transferred from hardware to software. However. software schemes may perform poorly because compile-time analysis may need IO be conservative, leading to unnecessary cache misses and main memory updates. In this paper, we use approximate Mean Value Analysis [U881 to compare the performance of a representative software scheme with a directory-based hardware scheme on a large-scale shared-memory system. In a previous study comparing the performance of hardware and software coherence, Cheong and VeidenOur resuIi, show that software schemes are haum used a parallelizing compiler to implement three difable (in terms of processor efficiency) IO hardware schemes ferent Software coherence schemes [Che90]. For selccted for a wide class of programs. The only cases for which subroutines Of Seven programs, they show that the hit ratio software schemes ,,erform sienificmtlv worse than of their most sophisticated software scheme (version con, ~~~ ~~~~~~ r~

[1]  D J Kuck,et al.  Parallel Supercomputing Today and the Cedar Approach , 1986, Science.

[2]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[3]  Anoop Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.

[4]  Mary K. Vernon,et al.  Comparison of hardware and software cache coherence schemes , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[5]  Mary K. Vernon,et al.  An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols , 1988, ISCA '88.

[6]  Randy H. Katz,et al.  The effect of sharing on the cache and bus performance of parallel programs , 1989, ASPLOS III.

[7]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA '85.

[8]  Anant Agarwal,et al.  Evaluating the performance of software cache coherence , 1989, ASPLOS 1989.

[9]  BaerJean-Loup,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986 .

[10]  Derek L. Eager,et al.  An analytic model of multistage interconnection networks , 1990, SIGMETRICS '90.

[11]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA 1985.

[12]  Alexander V. Veidenbaum,et al.  A cache coherence scheme with fast selective invalidation , 1988, ISCA '88.

[13]  Anant Agarwal,et al.  Evaluating the performance of software cache coherence , 1989, ASPLOS III.

[14]  Sang Lyul Min,et al.  A Performance Comparison of Directory-based and Timestamp-based Cache Coherence Schemes , 1990, ICPP.

[15]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[16]  Sang Lyul Min,et al.  Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps , 1992, IEEE Trans. Parallel Distributed Syst..

[17]  Hye-yeon Cheong Compiler-directed cache coherence strategies for large-scale sha , 1990 .

[18]  Kevin P. McAuliffe,et al.  Automatic Management of Programmable Caches , 1988, ICPP.