Dual cache architecture for low cost and high performance

We present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. Temporal locality is exploited by selectively moving small blocks into the direct-mapped cache after monitoring their activity in the spatial buffer. Spatial locality is enhanced by intelligently prefetching a neighboring block when a spatial buffer hit occurs. We show that the prefetch operation is highly accurate: over 90% of all prefetches generated are for blocks that are subsequently accessed. Our results show that the system enables the cache size to be reduced by a factor of four to eight relative to a conventional direct-mapped cache while maintaining similar performance.

[1]  Richard E. Hank,et al.  An efficient architecture for loop based data preloading , 1992, MICRO 1992.

[2]  G. Albera,et al.  Power/performance advantages of victim buffer in high-performance processors , 1999, Proceedings IEEE Alessandro Volta Memorial Workshop on Low-Power Design.

[3]  Mateo Valero,et al.  Static locality analysis for cache management , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[4]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[5]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[6]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[7]  Steven Przybylski The performance impact of block sizes and fetch strategies , 1990, ISCA '90.

[8]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[9]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[10]  Ken Chan,et al.  PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[11]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[12]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.