HitME: Low power Hit MEmory buffer for embedded systems

In this paper, we present a novel HitME (Hit-MEmory) buffer to reduce the energy consumption of memory hierarchy in embedded processors. The HitME buffer is a small direct-mapped cache memory that is added as additional memory into existing cache memory hierarchies. The HitME buffer is loaded only when there is a hit on L1 cache. Otherwise, L1 cache is updated from the memory and the processor's memory request is served directly from the L1 cache. The strategy works due to the fact that 90% of memory accesses are only accessed once, and these often pollute the cache. Energy reduction is achieved by reducing the number of accesses to the L1 cache memory. Experimental results show that the use of HitME buffer will reduce the L1 cache accesses resulting in a reduction in the energy consumption of the memory hierarchy. This decrease in L1 cache accesses reduces the cache system energy consumption by an average of 60.9% when compared to traditional L1 cache memory architecture and an energy reduction of 6.4% when compared to filter cache architecture for 70nm cache technology.

[1]  Daniel A. Jim Code Placement for Improving Dynamic Branch Prediction Accuracy , 2005 .

[2]  Saumya K. Debray,et al.  Profile-guided code compression , 2002, PLDI '02.

[3]  Sri Parameswaran,et al.  Exploiting statistical information for implementation of instruction scratchpad memory in embedded system , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Jorg Henkel,et al.  A/sup 2/BC: adaptive address bus coding for low power deep sub-micron designs , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[5]  Jörg Henkel,et al.  Code compression for low power embedded system design , 2000, Proceedings 37th Design Automation Conference.

[6]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Sri Parameswaran,et al.  REMcode: relocating embedded code for improving system efficiency , 2004 .

[8]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[9]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[10]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[11]  Peter Marwedel,et al.  Reducing energy consumption by dynamic copying of instructions onto onchip memory , 2002, 15th International Symposium on System Synthesis, 2002..

[12]  Sri Parameswaran,et al.  Finding optimal L1 cache configuration for embedded systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[13]  B. Moyer,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[14]  Frank Vahid,et al.  Tiny instruction caches for low power embedded systems , 2003, TECS.

[15]  Lea Hwang Lee,et al.  Low-Cost Embedded Program Loop Caching - Revisited , 1999 .

[16]  Shyamkumar Thoziyoor,et al.  1 CACTI 4 . 0 , 2006 .

[17]  J. T. Robinson,et al.  Data cache management using frequency-based replacement , 1990, SIGMETRICS '90.

[18]  Frank Vahid,et al.  Dynamic loop caching meets preloaded loop caching-a hybrid approach , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[19]  Simon Segars Low power design techniques for microprocessors , 2000 .

[20]  Ibrahim N. Hajj,et al.  Energy and performance improvements in microprocessor design using a loop cache , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[21]  Jörg Henkel,et al.  A$^{\mbox{\huge\bf 2}}$BC: adaptive address bus coding for low power deep sub-micron designs , 2001, DAC '01.

[22]  Dirk Grunwald,et al.  Reducing branch costs via branch alignment , 1994, ASPLOS VI.

[23]  Michael D. Smith,et al.  Procedure placement using temporal-ordering information , 1999, TOPL.