Improving data cache performance with integrated use of split caches, victim cache and stream buffers

In our prior work we explored a cache organization providing architectural support for distinguishing between memory references that exhibit spatial and temporal locality and mapping them to separate caches.That work showed that using separate (data) caches for indexed or stream data and scalar data items could lead to substantial improvements in terms of cache misses. In addition, such a separation allowed for the design of caches that could be tailored to meet the properties exhibited by different data items.In this paper, we investigate the interaction between three established methods: split cache, victim cache and stream buffer. Since significant amounts of compulsory and conflict misses are avoided, the size of each cache (i.e., array and scalar), as well as the combined cache capacity can be reduced. Our results show that on average 55% reduction in miss rates over the base configuration.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[4]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[5]  Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches , 1997, IEEE Trans. Computers.

[6]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[7]  Ken Chan,et al.  PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[8]  M. Valero,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, ICS '95.

[9]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[10]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[11]  Mateo Valero,et al.  A victim cache for vector registers , 1997, ICS '97.

[12]  Walid A. Najjar,et al.  Experimental Evaluation of Array Caches , 1997 .

[13]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[14]  Wen-mei W. Hwu,et al.  Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[15]  Mateo Valero,et al.  Software management of selective and dual data caches , 1997 .

[16]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[17]  Sally A. McKee,et al.  Smarter Memory: Improving Bandwidth for Streamed References , 1998, Computer.

[18]  Dirk Grunwald,et al.  A comparison of software code reordering and victim buffers , 1999, CARN.

[19]  The Split Spatial / Non-Spatial Cache : A Performance and Complexity Evaluation 0 LORã 3 UYXORYLü ' DUNR 0 DULQRY = RUDQ ' LPLWULMHYLü 9 HOMNR 0 , 1999 .

[20]  G. Albera,et al.  Power/performance advantages of victim buffer in high-performance processors , 1999, Proceedings IEEE Alessandro Volta Memorial Workshop on Low-Power Design.

[21]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[22]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[23]  Jang-Soo Lee,et al.  A new cache architecture based on temporal and spatial locality , 2000, J. Syst. Archit..

[24]  Tack-Don Han,et al.  A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure , 2000, LCTES.

[25]  Jim D. Garside,et al.  An asynchronous victim cache , 2002, Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools.

[26]  Israel Koren,et al.  The minimax cache: an energy-efficient framework for media processors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[27]  Afrin Naz,et al.  A Study of Separate Array and Scalar Caches , 2004, HPCS.

[28]  Frank Vahid,et al.  Using a victim buffer in an application-specific memory hierarchy , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.