A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

In the current embedded processors for media applications, up to 30% of the total processor power is consumed in the instruction memory hierarchy. In this context, we present an inherently low energy clustered instruction memory hierarchy template. Small instruction memories are distributed over groups of functional units and the interconnects are localized in order to minimize energy consumption. Furthermore, we present a simple profile based algorithm to optimally synthesize the L0 clusters, for a given application. Using a few representative multimedia benchmarks we show that up to 45% of the L0 buffer energy can be reduced using our clustering approach.

[1]  Scott A. Mahlke,et al.  Trimaran: An Infrastructure for Research in Instruction-Level Parallelism , 2004, LCPC.

[2]  Mahmut T. Kandemir,et al.  Power-aware partitioned cache architectures , 2001, ISLPED '01.

[3]  Gustavo de Veciana,et al.  Design Challenges for New Application-Specific Processors , 2000, IEEE Des. Test Comput..

[4]  Gustavo de Veciana,et al.  Exploring performance tradeoffs for clustered VLIW ASIPs , 2000, ICCAD.

[5]  Junqiang Sun,et al.  Tms320c6000 cpu and instruction set reference guide , 2000 .

[6]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[8]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[9]  Raminder Singh Bajwa,et al.  Instruction buffering to reduce power in processors for signal processing , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[10]  Peter Petrov,et al.  Power efficient embedded processor IPs through application-specific tag compression in data caches , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[11]  Michael C. Huang,et al.  L1 data cache decomposition for energy efficiency , 2001, ISLPED '01.

[12]  Andrew Wolfe,et al.  Datapath design for a VLIW video signal processor , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[13]  Nikil D. Dutt,et al.  System and architecture-level power reduction of microprocessor-based communication and multi-media applications , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[14]  Ira Krepchin,et al.  Texas Instruments Inc. , 1963, Nature.

[15]  Ibrahim N. Hajj,et al.  Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[16]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.