Dynamic techniques to reduce memory traffic in embedded systems

Memory transfers, in particular from/to off-chip memories, consume a significant amount of power. In order to reduce the amount of off-chip memory traffic, one or more levels of cache can be employed, located on the same die as the processor core. For performance, energy, and cost reasons, it is expedient that the on-chip cache is small and direct-mapped. Small, direct-mapped caches, however, generally produce much more traffic than needed. The purpose of this paper is two-fold. First, to measure how much traffic is generated by small, direct-mapped caches and what the minimal amount of traffic is. This yields an upper bound on the amount of traffic that can be saved by utilizing the on-chip memory more effectively. Second, we survey some techniques that can be deployed to reduce the amount of traffic produced by direct-mapped caches and present results for some of these techniques.

[1]  Peter Petrov,et al.  Performance and power effectiveness in embedded processors customizable partitioned caches , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Eric Rotenberg,et al.  Adaptive mode control: A static-power-efficient cache design , 2003, TECS.

[3]  Jeffrey B. Rothman,et al.  The pool of subsectors cache design , 1999, ICS '99.

[4]  Norman P. Jouppi,et al.  CACTI 2.0: An Integrated Cache Timing and Power Model , 2002 .

[5]  Jean-Loup Baer,et al.  Pursuing the performance potential of dynamic cache line sizes , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[6]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip , 1999 .

[7]  Luca Benini,et al.  Energy-efficient design of battery-powered embedded systems , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[8]  William H. Mangione-Smith,et al.  Filtering Memory References to Increase Energy Efficiency , 2000, IEEE Trans. Computers.

[9]  Mateo Valero,et al.  Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.

[10]  H. De Man,et al.  Global communication and memory optimizing transformations for low power signal processing systems , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[11]  Erik Brockmeyer,et al.  Data Access and Storage Management for Embedded Programmable Processors , 2002, Springer US.

[12]  Split Temporal / Spatial Cache : A Survey and Reevaluation of Performance 0 , 1999 .

[13]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[14]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[15]  Francky Catthoor Energy-Delay Efficient Data Storage and Transfer Architectures and Methodologies: Current Solutions and Remaining Problems , 1999, J. VLSI Signal Process..

[16]  G. Albera,et al.  Power/performance advantages of victim buffer in high-performance processors , 1999, Proceedings IEEE Alessandro Volta Memorial Workshop on Low-Power Design.

[17]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[18]  Stefanos Kaxiras,et al.  Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power , 2000, PACS.

[19]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[20]  Michael C. Huang,et al.  L1 data cache decomposition for energy efficiency , 2001, ISLPED '01.

[21]  Glenn Reinman,et al.  Reducing energy and delay using efficient victim caches , 2003, ISLPED '03.

[22]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[23]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[24]  Santosh G. Abraham,et al.  Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.

[25]  Norman P. Jouppi,et al.  An Integrated Cache Timing and Power Model , 2002 .

[26]  Rajesh K. Gupta,et al.  Adapting cache line size to application behavior , 1999, ICS '99.

[27]  Krishna V. Palem,et al.  Design space optimization of embedded memory systems via data remapping , 2002, LCTES/SCOPES '02.

[28]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[29]  Ben H. H. Juurlink,et al.  Unified dual data caches , 2003, Euromicro Symposium on Digital System Design, 2003. Proceedings..

[30]  Gary S. Tyson,et al.  Region-based caching: an energy-delay efficient memory architecture for embedded processors , 2000, CASES '00.

[31]  Teresa H. Meng,et al.  Portable video-on-demand in wireless communication , 1995, Proc. IEEE.

[32]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[33]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[34]  Ben H. H. Juurlink,et al.  Reducing traffic generated by conflict misses in caches , 2004, CF '04.