The shared regions approach to software cache coherence on multiprocessors

The effective management of caches is critical to the performance of applications on shared-memory multiprocessors. In this paper, we discuss a technique for software cache coherence tht is based upon the integration of a program-level abstraction for shared data with software cache management. The program-level abstraction, called Shared Regions, explicitly relates synchronization objects with the data they protect. Cache coherence algorithms are presented which use the information provided by shared region primitives, and ensure that shared regions are always cacheable by the processors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessor are shown. Comparisons with other software based coherence strategies, including a user-controlled strategy and an operating system-based strategy, show that this approach is able to deliver better performance, with relatively low corresponding overhead and only a small increase in the programming effort. Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.

[1]  Harjinder S. Sandhu,et al.  Region-Oriented Memory Management in Shared-Memory Multiprocessors , 1992 .

[2]  Mary K. Vernon,et al.  Comparison of hardware and software cache coherence schemes , 1991, ISCA '91.

[3]  Stephen S. Lavenberg,et al.  Mean-Value Analysis of Closed Multichain Queuing Networks , 1980, JACM.

[4]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[6]  Ralph Grishman,et al.  The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.

[7]  Kevin P. McAuliffe,et al.  Automatic Management of Programmable Caches , 1988, ICPP.

[8]  Anant Agarwal,et al.  Evaluating the performance of software cache coherence , 1989, ASPLOS III.

[9]  Monica S. Lam,et al.  Coarse-grain parallel programming in Jade , 1991, PPOPP '91.

[10]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[11]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[12]  A. Gottleib,et al.  The nyu ultracomputer- designing a mimd shared memory parallel computer , 1983 .

[13]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[14]  Michael Stumm,et al.  Hector: a hierarchically structured shared-memory multiprocessor , 1991, Computer.

[15]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessor , 1992, ASPLOS V.

[16]  Alexander V. Veidenbaum,et al.  Compiler-directed cache management in multiprocessors , 1990, Computer.

[17]  Robert Olson,et al.  Parallelizing Large Existing Programs: Methodology and Experiences , 1986, COMPCON.

[18]  Hendrik A. Goosen,et al.  Multi-level Shared Caching Techniques For Scalability In VMP-MC , 1989, The 16th Annual International Symposium on Computer Architecture.

[19]  D. R. Cheriton,et al.  Multi-level shared caching techniques for scalability in VMP-M/C , 1989, ISCA '89.

[20]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[21]  Ralph Grishman,et al.  The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract) , 1982, ISCA 1982.

[22]  Henry M. Levy,et al.  Distributed shared memory with versioned objects , 1992, OOPSLA.

[23]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.