Instruction Cache Energy Saving Through Compiler Way-Placement

Fetching instructions from a set-associative cache in an embedded processor can consume a large amount of energy due to the tag checks performed. Recent proposals to address this issue involve predicting or memoizing the correct way to access. However, they also require significant hardware storage which negates much of the energy saving. This paper proposes way-placement to save instruction cache energy. The compiler places the most frequently executed instructions at the start of the binary and at runtime these are mapped to explicit ways within the cache. We compare with a state-of-the-art hardware technique and show that our scheme saves almost 50% of the instruction cache energy compared to 32% for the hardware approach. We report results on a variety of cache sizes and associativities, achieving 59% instruction cache energy savings and an ED product of 0.80 in the best configuration with negligible hardware overhead and no ISA changes.

[1]  Jihong Kim,et al.  Instruction cache organisation for embedded low-power processors , 2001 .

[2]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Michael Zhang,et al.  Highly-Associative Caches for Low-Power Processors , 2000 .

[4]  K. De Bosschere,et al.  DIABLO: a reliable, retargetable and extensible link-time rewriting framework , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[5]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[6]  Tohru Ishihara,et al.  A non-uniform cache architecture for low power system design , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[7]  Scott A. Mahlke,et al.  Compiler managed dynamic instruction placement in a low-power code cache , 2005, International Symposium on Code Generation and Optimization.

[8]  Albert Ma,et al.  Way Memoization to Reduce Fetch Energy in Instruction Caches , 2001 .

[9]  Margaret Martonosi,et al.  XTREM: a power simulator for the Intel XScale® core , 2004, LCTES '04.

[10]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[11]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[12]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[13]  Wu-chun Feng,et al.  Towards efficient supercomputing: a quest for the right metric , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[14]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[15]  Ibrahim N. Hajj,et al.  Architectural and compiler techniques for energy reduction in high-performance microprocessors , 2000, IEEE Trans. Very Large Scale Integr. Syst..