论文信息 - The coherence predictor cache: a resource-efficient and accurate coherence prediction infrastructure

The coherence predictor cache: a resource-efficient and accurate coherence prediction infrastructure

Two-level coherence predictors have shown great promise to reduce coherence overhead in shared memory multiprocessors. However, to be accurate they require a memory overhead that on e.g. a 64-processor machine can be as high as 50%.Based on an application case study consisting of seven applications from SPLASH-2, a first observation made in this paper is that memory blocks subject to coherence activities usually constitute only a small fraction (around 10%) of the entire application footprint. Based on this, we contribute with a new class of resource-efficient coherence predictors that is organized as a cache attached to each memory controller. We show that such a Coherence Predictor Cache (CPC) can provide nearly as effective predictions as if a predictor is associated with every memory block, but needs only 2-7% as many predictors.

[1] Sarita V. Adve,et al. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[2] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[3] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[4] Josep Torrellas,et al. Data Forwarding in Scalable Shared-Memory Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[5] Michael J. Flynn,et al. Producer-consumer communication in distributed shared memory multiprocessors , 1999, Proc. IEEE.

[6] Stefanos Kaxiras,et al. Identification and optimization of sharing patterns for scalable shared-memory multiprocessors , 1998 .

[7] Babak Falsafi,et al. Memory sharing predictor: the key to a speculative coherent DSM , 1999, ISCA.

[8] Livio Ricciulli,et al. The detection and elimination of useless misses in multiprocessors , 1993, ISCA '93.

[9] B. Falsafi,et al. Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10] Robert J. Fowler,et al. Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.

[11] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[12] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[13] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[14] A.R. Newton,et al. An empirical evaluation of two memory-efficient directory methods , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[15] Anoop Gupta,et al. Working Sets, Cache Sizes, And Node Granularity Issues For Large-scale Multiprocessors , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[16] Per Stenström,et al. Simple compiler algorithms to reduce ownership overhead in cache coherence protocols , 1994, ASPLOS VI.

[17] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[18] Håkan Grahn,et al. Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection , 1996, J. Parallel Distributed Comput..

[19] Håkan Grahn,et al. SimICS/Sun4m: A Virtual Workstation , 1998, USENIX Annual Technical Conference.

[20] Shubhendu S. Mukherjee,et al. Using prediction to accelerate coherence protocols , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[21] Mats Brorsson,et al. An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[22] Stefanos Kaxiras,et al. Improving CC-NUMA performance using Instruction-based Prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[23] David L. Weaver,et al. The SPARC Architecture Manual , 2003 .