Improving data cache performance with integrated use of split caches, victim cache and stream buffers

In our prior work we explored a cache organization providing architectural support for distinguishing between memory references that exhibit spatial and temporal locality and mapping them to separate caches.That work showed that using separate (data) caches for indexed or stream data and scalar data items could lead to substantial improvements in terms of cache misses. In addition, such a separation allowed for the design of caches that could be tailored to meet the properties exhibited by different data items.In this paper, we investigate the interaction between three established methods: split cache, victim cache and stream buffer. Since significant amounts of compulsory and conflict misses are avoided, the size of each cache (i.e., array and scalar), as well as the combined cache capacity can be reduced. Our results show that on average 55% reduction in miss rates over the base configuration.

[1]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[2]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[3]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[4]  Dirk Grunwald,et al.  A comparison of software code reordering and victim buffers , 1999, CARN.

[5]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[6]  A. Varma,et al.  Selective victim caching: a method to improve the performance of direct-mapped caches , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[7]  Israel Koren,et al.  The minimax cache: an energy-efficient framework for media processors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[8]  Tack-Don Han,et al.  A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure , 2000, LCTES.

[9]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[10]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[12]  Mateo Valero,et al.  A victim cache for vector registers , 1997, ICS '97.

[13]  Wen-mei W. Hwu,et al.  Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[14]  Jim D. Garside,et al.  An asynchronous victim cache , 2002, Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools.

[15]  Ken Chan,et al.  PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[16]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[17]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[18]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[19]  Sally A. McKee,et al.  Smarter Memory: Improving Bandwidth for Streamed References , 1998, Computer.

[20]  Afrin Naz,et al.  A Study of Separate Array and Scalar Caches , 2004, HPCS.

[21]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[22]  Mateo Valero,et al.  Software management of selective and dual data caches , 1997 .

[23]  G. Albera,et al.  Power/performance advantages of victim buffer in high-performance processors , 1999, Proceedings IEEE Alessandro Volta Memorial Workshop on Low-Power Design.

[24]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[25]  Jang-Soo Lee,et al.  A new cache architecture based on temporal and spatial locality , 2000, J. Syst. Archit..

[26]  Frank Vahid,et al.  Using a victim buffer in an application-specific memory hierarchy , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.