Power efficient embedded processor IPs through application-specific tag compression in data caches

In this paper, we present a methodology for power minimization by data cache tag compression. The set of tags being accessed by the major application loops is analyzed statically during compile time and an efficient and optimal compression scheme is proposed Only a very limited number of tag bits are stored in the tag array for cache conflict identification, thus achieving a significant reduction in the number of active bitlines, sense amps, and comparator cells. The underlying hardware support for dynamically compressing the tags consists of a highly cost and power efficient programmable encoder which lies outside the cache access path, thus not affecting the processor cycle time. A detailed VLSI implementation has been performed and a number of experimental results on a set of embedded applications and numerical kernels is reported Energy dissipation decreases of up to 95% can be observed for the tag arrays, while significant energy reductions in the range of 10%-50% are observed when amortized across the overall cache subsystem.

[1]  Nikil D. Dutt,et al.  Low-power memory mapping through reducing address bus activity , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[2]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Shoichiro Nakamura Applied numerical methods with software , 1991 .

[4]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  Carlos Delgado Kloos,et al.  Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[6]  Luca Benini,et al.  Dynamic voltage scaling and power management for portable systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[7]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[8]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  K. Ghose,et al.  Analytical energy dissipation models for low power caches , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[10]  Ibrahim N. Hajj,et al.  Using dynamic cache management techniques to reduce energy in a high-performance processor , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[11]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[12]  R. Stephany,et al.  A 200MHz 32b 0.5W CMOS RISC Microprocessor , 1998 .

[13]  BurgerDoug,et al.  The SimpleScalar tool set, version 2.0 , 1997 .

[14]  Norman P. Jouppi,et al.  An Integrated Cache Timing and Power Model , 2002 .

[15]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[16]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[17]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).