Distance-aware L2 Cache Organizations for Scalable Multiprocessor Systems

In this paper, we suggest an LRU/distance-aware combined second-level (L2) cache for scalable CC-NUMA multiprocessors, which is composed of a traditional LRU cache and an additional cache maintaining the distance information of individual cache blocks. The LRU cache selects a victim using age information, while the distance-aware cache does this using distance information. Both work together to reduce the overall distance effectively upon cache misses by keeping long-distance blocks as well as recently used blocks. It has been observed that the proposed cache outperforms the traditional LRU cache by up to 28% in the execution time. It is also found to perform even better than an LRU cache of twice the size.

[1]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[2]  Alexander V. Veidenbaum,et al.  An Integrated Hardware/Software Solution for Effective Management of Local Storage in High-Performance Systems , 1991, ICPP.

[3]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  David J. Lilja,et al.  The effect of using state-based priority information in a shared-memory multiprocessor cache replacement policy , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[5]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  James R. Goodman,et al.  Limited bandwidth to affect processor design , 1997, IEEE Micro.

[8]  Anoop Gupta,et al.  The DASH prototype: implementation and performance , 1992, ISCA '92.

[9]  Kimming So,et al.  Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[10]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[11]  N. Ranganathan,et al.  Utilization of Cache Area in On-Chip Multiprocessor , 1999, ISHPC.

[12]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[13]  Laxmi N. Bhuyan,et al.  Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[14]  Michel Dubois,et al.  Optimal replacements in caches with two miss costs , 1999, SPAA '99.

[15]  Adrian Moga,et al.  The effectiveness of SRAM network caches in clustered DSMs , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.