论文信息 - Distance-aware L2 Cache Organizations for Scalable Multiprocessor Systems

Distance-aware L2 Cache Organizations for Scalable Multiprocessor Systems

In this paper, we suggest an LRU/distance-aware combined second-level (L2) cache for scalable CC-NUMA multiprocessors, which is composed of a traditional LRU cache and an additional cache maintaining the distance information of individual cache blocks. The LRU cache selects a victim using age information, while the distance-aware cache does this using distance information. Both work together to reduce the overall distance effectively upon cache misses by keeping long-distance blocks as well as recently used blocks. It has been observed that the proposed cache outperforms the traditional LRU cache by up to 28% in the execution time. It is also found to perform even better than an LRU cache of twice the size.

Sung Woo Chung | Chu Shik Jhon | Hyong-Shik Kim

[1] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[2] Alexander V. Veidenbaum,et al. An Integrated Hardware/Software Solution for Effective Management of Local Storage in High-Performance Systems , 1991, ICPP.

[3] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4] David J. Lilja,et al. The effect of using state-based priority information in a shared-memory multiprocessor cache replacement policy , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[5] Robert J. Fowler,et al. MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[6] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[7] James R. Goodman,et al. Limited bandwidth to affect processor design , 1997, IEEE Micro.

[8] Anoop Gupta,et al. The DASH prototype: implementation and performance , 1992, ISCA '92.

[9] Kimming So,et al. Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[10] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .

[11] N. Ranganathan,et al. Utilization of Cache Area in On-Chip Multiprocessor , 1999, ISHPC.

[12] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[13] Laxmi N. Bhuyan,et al. Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[14] Michel Dubois,et al. Optimal replacements in caches with two miss costs , 1999, SPAA '99.

[15] Adrian Moga,et al. The effectiveness of SRAM network caches in clustered DSMs , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.