Techniques for Compiler-Directed Cache Coherence

The performance of large scale shared memory multiprocessors can be greatly improved if they can cache remote shared data in the private caches of the processors. However, maintaining cache coherence for such systems remains a challenge. Although hardware directory schemes give good performance, they might be too complicated and expensive for large scale multiprocessors. The article provides a comprehensive guide of an alternative approach, called compiler directed cache coherence techniques. Compiler directed techniques maintain coherence of caches locally by individual processors, eliminating the need for directory hardware and interprocessor communication. We survey the state of the art software and hardware compiler directed techniques and discuss the basic concepts and issues. We also demonstrate the feasibility and performance of compiler directed cache coherence by presenting a case study of the Two Phase Invalidation scheme.

[1]  Sang Lyul Min,et al.  A Timestamp-based Cache Coherence Scheme , 1989, ICPP.

[2]  Alexander V. Veidenbaum,et al.  Compiler-directed cache management in multiprocessors , 1990, Computer.

[3]  Ken Kennedy,et al.  Automatic software cache coherence through vectorization , 1992, ICS '92.

[4]  Hoichi Cheong,et al.  Life span strategy—a compiler-based approach to cache coherence , 1992, ICS '92.

[5]  Sang Lyul Min,et al.  Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps , 1992, IEEE Trans. Parallel Distributed Syst..

[6]  K. Kennedy,et al.  Cache coherence using local knowledge , 1993, Supercomputing '93.

[7]  Kevin P. McAuliffe,et al.  Automatic Management of Programmable Caches , 1988, ICPP.

[8]  Pen-Chung Yew,et al.  Compiler and Hardware Support for Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Study , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[9]  Pen-Chung Yew,et al.  Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors , 1996 .

[10]  Rudolf Eigenmann,et al.  Polaris: A New-Generation Parallelizing Compiler for MPPs , 1993 .

[11]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[12]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[13]  Ahmed Louri,et al.  A Compiler Directed Cache Coherence Scheme with Fast and Parallel Explicit Invalidation , 1992, ICPP.