Customized Placement for High Performance Embedded Processor Caches

In this paper, we propose the use of compiler controlled customized placement policies for embedded processor data caches. Profile driven customized placement improves the sharing of cache resources across memory lines thereby reducing conflict misses and lowering the average memory access time (AMAT) and consequently execution time. Alternatively, customized placement policies can be used to reduce the cache size and associativity for a fixed AMAT with an attendant reduction in power and area. These advantages are achieved with a small increase in complexity of the address translation in indexing the cache. The consequent increase in critical path length is offset by lowered miss rates. Simulation experiments with embedded benchmark kernels show that caches with customized placement provide miss rates comparable to traditional caches with larger sizes and higher associativities.

[1]  Margaret Martonosi,et al.  Improving Power Efficiency with an Asymmetric Set-Associative Cache , 2004 .

[2]  Peter Petrov,et al.  Towards effective embedded processors in codesigns: customizable partitioned caches , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).

[3]  Krste Asanovic,et al.  Fine-grain CAM-tag cache resizing using miss tags , 2002, ISLPED '02.

[4]  Sally A. McKee,et al.  Reflections on the memory wall , 2004, CF '04.

[5]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[6]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[7]  Rajeev Barua,et al.  Dynamic allocation for scratch-pad memory using compile-time decisions , 2006, TECS.

[8]  George Varghese,et al.  A pipelined memory architecture for high throughput network processors , 2003, ISCA '03.

[9]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[11]  Nicholas Nethercote,et al.  Valgrind: A Program Supervision Framework , 2003, RV@CAV.

[12]  Subramanian Ramaswamy,et al.  Data trace cache: an application specific cache architecture , 2006, SIGARCH Comput. Archit. News.

[13]  Chuanjun Zhang Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[14]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[15]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[16]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[17]  Per Stenström,et al.  On reconfigurable on-chip data caches , 1991, MICRO 24.

[18]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[19]  Yale N. Patt,et al.  The V-Way Cache: Demand Based Associativity via Global Replacement , 2005, ISCA 2005.

[20]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.

[21]  Srinivas Devadas,et al.  Application-specific memory management for embedded systems using software-controlled caches , 2000, Proceedings 37th Design Automation Conference.

[22]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[23]  Subramanian Ramaswamy,et al.  Customizable Fault Tolerant Caches for Embedded Processors , 2006, 2006 International Conference on Computer Design.

[24]  Anant Agarwal,et al.  Software-based instruction caching for embedded processors , 2006, ASPLOS XII.

[25]  Jih-Kwon Peir,et al.  Capturing dynamic memory reference behavior with adaptive cache topology , 1998, ASPLOS VIII.

[26]  Krishna V. Palem,et al.  Data remapping for design space optimization of embedded memory systems , 2003, TECS.