A Study of Separate Array and Scalar Caches

to substantial improvements in terms of cache misses. In addition, such a separation allowed for the design of caches that could be tailored to meet the properties exhibited by different data items. In this paper we explore a similar cache organization providing architectural support for distinguishing between memory references that exhibit spatial and temporal locality and mapping them to separate caches. Since significant amounts of compulsory and conflict misses are avoided, the size of each cache (i.e., array and scalar), as well as the combined cache capacity can be reduced. According to the results of our simulations a partitioned 4k scalar cache with the streams (or arrays) mapped to a 2k array cache can be more efficient than a 16k unified data cache.

[1]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[2]  Ken Chan,et al.  PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[3]  Krishna M. Kavi,et al.  Optimization of Storage-Referencing Gestures , 2003 .

[4]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[5]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[6]  Tack-Don Han,et al.  A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure , 2000, LCTES.

[7]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[8]  Krishna M. Kavi,et al.  Design of cache memories for dataflow architecture , 1998, J. Syst. Archit..

[9]  Mateo Valero,et al.  Software management of selective and dual data caches , 1997 .

[10]  Peter Petrov,et al.  Towards effective embedded processors in codesigns: customizable partitioned caches , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).

[11]  Krishna M. Kavi,et al.  Cache Performance of Scheduled Dataflow Architecture , 2000 .

[12]  Sally A. McKee,et al.  Smarter Memory: Improving Bandwidth for Streamed References , 1998, Computer.

[13]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[14]  Jang-Soo Lee,et al.  A new cache architecture based on temporal and spatial locality , 2000, J. Syst. Archit..

[15]  The Split Spatial / Non-Spatial Cache : A Performance and Complexity Evaluation 0 LORã 3 UYXORYLü ' DUNR 0 DULQRY = RUDQ ' LPLWULMHYLü 9 HOMNR 0 , 1999 .

[16]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[17]  Israel Koren,et al.  The minimax cache: an energy-efficient framework for media processors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[18]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  Krishna M. Kavi,et al.  Storage Allocation for Real-Time, Embedded Systems , 2001, EMSOFT.

[21]  Krishna M. Kavi,et al.  Utilization of Separate Caches to Eliminate Cache Pollution Caused by Memory Management Functions , 2003, PDCS.

[22]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[23]  Ron K. Cytron,et al.  Hardware Support for Fast and Bounded-Time Storage Allocation , 2002 .

[24]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[25]  Walid A. Najjar,et al.  Experimental Evaluation of Array Caches , 1997 .

[26]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[27]  Krishna M. Kavi,et al.  Intelligent memory manager: towards improving the locality behavior of allocation-intensive applications , 2004 .

[28]  R. Rajamani,et al.  A CMOS RISC CPU with on-chip parallel cache , 1994, Proceedings of IEEE International Solid-State Circuits Conference - ISSCC '94.

[29]  Krishna M. Kavi,et al.  Design of cache memories for multi-threaded dataflow architecture , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.