论文信息 - A Second-Level Cache With the Distance-Aware Replacement Policy for NUMA Systems

A Second-Level Cache With the Distance-Aware Replacement Policy for NUMA Systems

The cache replacement policy is one of the most important factors that affect the cache performance. With the trend of increasing associativity in second-level caches, implementing an efficient replacement algorithm becomes more important than just eliminating conflict misses. The LRU cache replacement algorithm has been known to work well in a single processor system by reducing the cache miss rate, but it does not minimize the cache replacement cost on the interconnection for a multiprocessor system because it does not take the distance into account. In this paper, we suggest a distance-aware second level (L2) cache for scalable multiprocessors, which is composed of a traditional LRU cache and an additional SDF (Shortest Distance First) cache. The LRU cache selects a victim using age information, while the SDF cache does so using distance information. Both work together to minimize the overall replacement cost by keeping long-distance blocks as well as recently used blocks. The combined L2 cache reduces the cache miss rate compared to the original LRU cache in many cases. With 32 processors, a 512KB LRU/SDF L2 cache outperforms a 512KB LRU L2 cache. Moreover, the replacement traffic on an interconnection network such as the ring is suppressed by up to 69%, which is expected to bring more scalability to multiprocessor systems.

[1] Laxmi N. Bhuyan,et al. Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[2] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[3] Alexander V. Veidenbaum,et al. An Integrated Hardware/Software Solution for Effective Management of Local Storage in High-Performance Systems , 1991, ICPP.

[4] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5] Michel Dubois,et al. Optimal replacements in caches with two miss costs , 1999, SPAA '99.

[6] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .

[7] James R. Goodman,et al. Limited bandwidth to affect processor design , 1997, IEEE Micro.

[8] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9] Robert J. Fowler,et al. MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10] Kimming So,et al. Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[11] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[12] David J. Lilja,et al. The effect of using state-based priority information in a shared-memory multiprocessor cache replacement policy , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).