A Study of Reconfigurable Split Data Caches and Instruction Caches

In this paper we show that cache memories for embedded applications can be designed to both increase performance and reduce energy consumed. We show that using separate (data) caches for indexed or stream data and scalar data items can lead to substantial improvements in terms of cache misses. The sizes of the various cache structure should be customized to meet applications’ needs. We show that reconfigurable split data caches can be designed to meet wide-ranging embedded applications’ performance, energy and silicon area budgets. The optimal cache organizations can lead to on average 62% and 49% reduction in the overall cache size, 37% and 21% reduction in cache access time and 47% and 52% reduction in power consumption for instruction and data cache respectively when compared to an 8k byte instruction and an 8k byte unified data cache for media benchmarks from MiBench suite.

[1]  Edward McLellan The Alpha AXP architecture and 21064 processor , 1993, IEEE Micro.

[2]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[4]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[5]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[6]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[7]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[8]  Krishna M. Kavi,et al.  Performance enhancement by eliminating redundant function execution , 2006, 39th Annual Simulation Symposium (ANSS'06).

[9]  G.S. Sohi,et al.  Dynamic Instruction Reuse , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[10]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[11]  Frank Vahid,et al.  Using a victim buffer in an application-specific memory hierarchy , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[12]  Israel Koren,et al.  The minimax cache: an energy-efficient framework for media processors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[13]  Kanad Ghose,et al.  Energy-efficiency of VLSI caches: a comparative study , 1997, Proceedings Tenth International Conference on VLSI Design.

[14]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[15]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[16]  Frank Vahid,et al.  Energy benefits of a configurable line size cache for embedded systems , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[17]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[18]  Walid A. Najjar,et al.  Experimental Evaluation of Array Caches , 1997 .