Tiny split data-caches make big performance impact for embedded applications

This paper shows that even very small data caches, when split to serve data streams exhibiting temporal and spatial localities, can improve performance of embedded applications without consuming excessive silicon real estate or power. It also shows that large block sizes or higher set-associativities are unnecessary with split cache organizations. We use benchmark programs from MiBench to show that our cache organization outperforms other organizations in terms of miss rates, access times, energy consumption and silicon area.

[1]  Kanad Ghose,et al.  Energy-efficiency of VLSI caches: a comparative study , 1997, Proceedings Tenth International Conference on VLSI Design.

[2]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[3]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[4]  R. Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[5]  Jang-Soo Lee,et al.  A new cache architecture based on temporal and spatial locality , 2000, J. Syst. Archit..

[6]  Frank Vahid,et al.  Using a victim buffer in an application-specific memory hierarchy , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[7]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[8]  Frank Vahid,et al.  Interface and cache power exploration for core-based embedded system design , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[9]  Mateo Valero,et al.  Software management of selective and dual data caches , 1997 .

[10]  A. Argawal,et al.  Cache performance of operating systems and multiprogramming , 1988 .

[11]  Shin-Dug Kim,et al.  An energy efficient cache memory architecture for embedded systems , 2004, SAC.

[12]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[13]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[14]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[15]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[16]  Frank Vahid,et al.  Synthesis of customized loop caches for core-based embedded systems , 2002, ICCAD 2002.

[17]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[18]  Israel Koren,et al.  The minimax cache: an energy-efficient framework for media processors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[19]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[20]  The Split Spatial / Non-Spatial Cache : A Performance and Complexity Evaluation 0 LORã 3 UYXORYLü ' DUNR 0 DULQRY = RUDQ ' LPLWULMHYLü 9 HOMNR 0 , 1999 .

[21]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[22]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[23]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[24]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[25]  Nikil D. Dutt,et al.  Automatic tuning of two-level caches to embedded applications , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[26]  Frank Vahid,et al.  Energy benefits of a configurable line size cache for embedded systems , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[27]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[28]  K. Kavi,et al.  Improving data cache performance with integrated use of split caches, victim cache and stream buffers , 2004, MEDEA '04.

[29]  Charles C. Weems,et al.  Application-adaptive intelligent cache memory system , 2002, TECS.

[30]  Hugo De Man,et al.  Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications , 2005, IEEE Transactions on Computers.

[31]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[32]  Norman P. Jouppi Retrospective: improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, ISCA '98.

[33]  Bruce Jacob,et al.  Cache Design for Embedded Real-Time Systems , 1999 .

[34]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[35]  Afrin Naz,et al.  A Study of Separate Array and Scalar Caches , 2004, HPCS.

[36]  Srinivas Devadas,et al.  Software-assisted cache replacement mechanisms for embedded systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[37]  SweanyPhilip,et al.  Improving data cache performance with integrated use of split caches, victim cache and stream buffers , 2004 .

[38]  Kimming So,et al.  Cache design of a sub-micron CMOS system/370 , 1987, ISCA '87.

[39]  Ken Chan,et al.  PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[40]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[41]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.