Improving Memory Utilization in Cache Coherence Directories

Efficiently maintaining cache coherence is a major problem in large-scale shared memory multiprocessors. Hardware directory coherence schemes have very high memory requirements, while software-directed schemes must rely on imprecise compile-time memory disambiguation. Recently proposed dynamically tagged directory schemes allocate pointers to blocks only as they are referenced, which significantly reduces their memory requirements, but they still allocate pointers to blocks that do not need them. The authors present two compiler optimizations that exploit the high-level sharing information available to the compiler to further reduce the size of a tagged directory by allocating pointers only when necessary. Trace-driven simulations are used to show that the performance of this combined hardware-software approach is comparable to other coherence schemes, but with significantly lower memory requirements. In addition, these simulations suggest that this approach is less sensitive to the quality of the memory disambiguation and interprocedural analysis performed by the compiler than software-only coherence schemes. >

[1]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[2]  Stein Gjessing,et al.  Distributed-directory scheme: scalable coherent interface , 1990, Computer.

[3]  Sang Lyul Min,et al.  A Timestamp-based Cache Coherence Scheme , 1989, ICPP.

[4]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[5]  Anoop Gupta,et al.  Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[6]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[7]  Vikram S. Adve,et al.  Comparison of hardware and software cache coherence schemes , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[8]  Lars Lundberg,et al.  A Lockup-Free Multiprocessor Cache Design , 1991, ICPP.

[9]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[10]  Alexander V. Veidenbaum,et al.  Stale Data Detection and Coherence Enforcement Using Flow Analysis , 1988, ICPP.

[11]  Pen-Chung Yew,et al.  Multiprocessor cache design considerations , 1987, ISCA '87.

[12]  Sang Lyul Min,et al.  A Performance Comparison of Directory-based and Timestamp-based Cache Coherence Schemes , 1990, ICPP.

[13]  Anant Agarwal,et al.  Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.

[14]  Marc Snir,et al.  The Performance of Multistage Interconnection Networks for Multiprocessors , 1983, IEEE Transactions on Computers.

[15]  D J Kuck,et al.  Parallel Supercomputing Today and the Cedar Approach , 1986, Science.

[16]  Anoop Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.

[17]  David B. Gustavson,et al.  Scalable Coherent Interface , 1990, COMPEURO'90: Proceedings of the 1990 IEEE International Conference on Computer Systems and Software Engineering@m_Systems Engineering Aspects of Complex Computerized Systems.

[18]  David A. Padua,et al.  Cedar Fortran and Its Compiler , 1990, CONPAR.

[19]  James K. Archibald,et al.  An economical solution to the cache coherence problem , 1984, ISCA '84.

[20]  Jay Hoeflinger,et al.  Cedar Fortran and other vector and parallel Fortran dialects , 1988, Supercomputing '88.

[21]  Alexander V. Veidenbaum,et al.  A version control approach to Cache coherence , 1989, ICS '89.

[22]  Sandra Johnson Baylor,et al.  A Study of the Memory Reference Behavior of Engineering/Scientific Applications in Parallel Processors , 1989, ICPP.

[23]  Mary K. Vernon,et al.  Comparison of hardware and software cache coherence schemes , 1991, ISCA '91.

[24]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA 1985.

[25]  David J. Lilja,et al.  Combining hardware and software cache coherence strategies , 1991, ICS '91.

[26]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[27]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[28]  A. Richard Newton,et al.  An empirical evaluation of two memory-efficient directory methods , 1990, ISCA '90.

[29]  Ralph Grishman,et al.  The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract) , 1982, ISCA 1982.

[30]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[31]  David J. Lilja Processor parallelism considerations and memory latency reduction in shared memory multiprocessors , 1992 .

[32]  Michel Dubois,et al.  The design of a lockup-free cache for high-performance multiprocessors , 1988, Proceedings. SUPERCOMPUTING '88.

[33]  Alexander V. Veidenbaum,et al.  A cache coherence scheme with fast selective invalidation , 1988, ISCA '88.

[34]  Alexander V. Veidenbaum,et al.  A Compiler-Assisted Cache Coherence Solution for Multiprcessors , 1986, ICPP.

[35]  Sang Lyul Min,et al.  Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps , 1992, IEEE Trans. Parallel Distributed Syst..