论文信息 - Dual cache architecture for low cost and high performance

Dual cache architecture for low cost and high performance

We present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. Temporal locality is exploited by selectively moving small blocks into the direct-mapped cache after monitoring their activity in the spatial buffer. Spatial locality is enhanced by intelligently prefetching a neighboring block when a spatial buffer hit occurs. We show that the prefetch operation is highly accurate: over 90% of all prefetches generated are for blocks that are subsequently accessed. Our results show that the system enables the cache size to be reduced by a factor of four to eight relative to a conventional direct-mapped cache while maintaining similar performance.

Shin-Dug Kim | Jung-Hoon Lee | Gi-Ho Park

[1] Richard E. Hank,et al. An efficient architecture for loop based data preloading , 1992, MICRO 1992.

[2] G. Albera,et al. Power/performance advantages of victim buffer in high-performance processors , 1999, Proceedings IEEE Alessandro Volta Memorial Workshop on Low-Power Design.

[3] Mateo Valero,et al. Static locality analysis for cache management , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[4] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[5] Kanad Ghose,et al. Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[6] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[7] Steven Przybylski. The performance impact of block sizes and fetch strategies , 1990, ISCA '90.

[8] Michael J. Flynn,et al. An area model for on-chip memories and its application , 1991 .

[9] Edward S. Davidson,et al. Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[10] Ken Chan,et al. PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[11] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[12] Antonio Gonzalez,et al. A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.