A Second-Level Cache With the Distance-Aware Replacement Policy for NUMA Systems

The cache replacement policy is one of the most important factors that affect the cache performance. With the trend of increasing associativity in second-level caches, implementing an efficient replacement algorithm becomes more important than just eliminating conflict misses. The LRU cache replacement algorithm has been known to work well in a single processor system by reducing the cache miss rate, but it does not minimize the cache replacement cost on the interconnection for a multiprocessor system because it does not take the distance into account. In this paper, we suggest a distance-aware second level (L2) cache for scalable multiprocessors, which is composed of a traditional LRU cache and an additional SDF (Shortest Distance First) cache. The LRU cache selects a victim using age information, while the SDF cache does so using distance information. Both work together to minimize the overall replacement cost by keeping long-distance blocks as well as recently used blocks. The combined L2 cache reduces the cache miss rate compared to the original LRU cache in many cases. With 32 processors, a 512KB LRU/SDF L2 cache outperforms a 512KB LRU L2 cache. Moreover, the replacement traffic on an interconnection network such as the ring is suppressed by up to 69%, which is expected to bring more scalability to multiprocessor systems.

[1]  Laxmi N. Bhuyan,et al.  Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Alexander V. Veidenbaum,et al.  An Integrated Hardware/Software Solution for Effective Management of Local Storage in High-Performance Systems , 1991, ICPP.

[4]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  Michel Dubois,et al.  Optimal replacements in caches with two miss costs , 1999, SPAA '99.

[6]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[7]  James R. Goodman,et al.  Limited bandwidth to affect processor design , 1997, IEEE Micro.

[8]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10]  Kimming So,et al.  Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[11]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[12]  David J. Lilja,et al.  The effect of using state-based priority information in a shared-memory multiprocessor cache replacement policy , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).