Compiler and Hardware Support for Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Study

In this paper, we study a hardware-supported, compiler directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. It can be adapted to various cache organizations, including multi-word cache lines and byte-addressable architectures. Several system related issues, including critical sections, inter-thread communication, and task migration have also been addressed. The cost of the required hardware support is small and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data-flow analysis, have been implemented on the Polaris compiler [17].From our simulation study using the Perfect Club benchmarks, we found that, in spite of the conservative analysis made by the compiler, the performance of the proposed HSCD scheme can be comparable to that of a full-map hardware directory scheme. With its comparable performance and reduced hardware cost, the scheme can be a viable alternative for large-scale multiprocessors, such as the Cray T3D, that rely on users to maintain data coherence.

[1]  Rudolf Eigenmann,et al.  Polaris: A New-Generation Parallelizing Compiler for MPPs , 1993 .

[2]  Pen-Chung Yew,et al.  Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors , 1996 .

[3]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[4]  Thomas G. Robertazzi,et al.  The Performance of Multistage Interconnection Networks for Multiprocessors , 1993 .

[5]  David J. Lilja,et al.  Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons , 1993, CSUR.

[6]  Arthur B. Maccabe,et al.  The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages , 1990, PLDI '90.

[7]  Marc Snir,et al.  The Performance of Multistage Interconnection Networks for Multiprocessors , 1983, IEEE Transactions on Computers.

[8]  Dean M. Tullsen,et al.  Limitations of cache prefetching on a bus-based multiprocessor , 1993, ISCA '93.

[9]  Alexander V. Veidenbaum,et al.  Compiler-directed cache management in multiprocessors , 1990, Computer.

[10]  Yung-Chin Chen,et al.  Cache Design and Performance in a Large-Scale Shared-Memory Multiprocessor System , 1993 .

[11]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[12]  Hoichi Cheong,et al.  Life span strategy—a compiler-based approach to cache coherence , 1992, ICS '92.

[13]  Sang Lyul Min,et al.  Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps , 1992, IEEE Trans. Parallel Distributed Syst..

[14]  K. Kennedy,et al.  Cache coherence using local knowledge , 1993, Supercomputing '93.

[15]  Qing Yang,et al.  CAT - caching address tags - a technique for reducing area cost of on-chip caches , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[16]  Vikram S. Adve,et al.  Comparison of hardware and software cache coherence schemes , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[17]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[18]  Pen-Chung Yew,et al.  A compiler-directed cache coherence scheme with improved intertask locality , 1994, Proceedings of Supercomputing '94.

[19]  D. K. Poulsen,et al.  Execution-driven tools for parallel simulation of parallel architectures and applications , 1993, Supercomputing '93.

[20]  Tzi-cker Chiueh,et al.  A Generational Algorithm to Multiprocessor Cache Coherence , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[21]  Dean M. Tullsen,et al.  Limitations Of Cache Prefetching On A Bus-based Multiprocessor , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[22]  Thomas J. LeBlanc,et al.  Adjustable block size coherent caches , 1992, ISCA '92.

[23]  Mary K. Vernon,et al.  Comparison of hardware and software cache coherence schemes , 1991, ISCA '91.

[24]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[25]  Alexander V. Veidenbaum,et al.  A Compiler-Assisted Cache Coherence Solution for Multiprcessors , 1986, ICPP.