Synthesis of customized loop caches for core-based embedded systems

Embedded system programs tend to spend much time in small loops. Introducing a very small loop cache into the instruction memory hierarchy has thus been shown to substantially reduce instruction fetch energy. However, loop caches come in many sizes and variations -- using the configuration best on the average may actually result in worsened energy for a specific program. We therefore introduce a loop cache exploration tool that analyzes a particular program's profile, rapidly explores the possible configurations, and generates the configuration with the greatest power savings. We introduce a simulation-based approach and show the good energy savings that a customized loop cache yields. We also introduce a fast estimation-based approach that obtains nearly the same results in seconds rather than tens of minutes or hours.

[1]  Chaitali Chakrabarti,et al.  Memory Design and Exploration for Low Power, Embedded Systems , 1999 .

[2]  Luca Benini,et al.  Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems , 1997, Proceedings Great Lakes Symposium on VLSI.

[3]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  B. Ramakrishna Rau,et al.  Automatic architectural synthesis of VLIW and EPIC processors , 1999, Proceedings 12th International Symposium on System Synthesis.

[5]  Mary Jane Irwin,et al.  An extended addressing mode for low power , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[6]  Srilatha Manne,et al.  Power and performance tradeoffs using various caching strategies , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[7]  U. Ko,et al.  Characterization and design of a low-power, high-performance cache architecture , 1995, 1995 International Symposium on VLSI Technology, Systems, and Applications. Proceedings of Technical Papers.

[8]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[9]  Darko Kirovski,et al.  Procedure Based Program Compression , 2004, International Journal of Parallel Programming.

[10]  Frank Vahid,et al.  Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example , 2002, IEEE Computer Architecture Letters.

[11]  Jörg Henkel,et al.  Code compression for low power embedded system design , 2000, Proceedings 37th Design Automation Conference.

[12]  Alexander Chatzigeorgiou,et al.  Memory hierarchy exploration for low power architectures in embedded multimedia applications , 2001, ICIP.

[13]  Mahmut T. Kandemir,et al.  Power-aware partitioned cache architectures , 2001, ISLPED '01.

[14]  H. De Man,et al.  Optimization of memory organization and hierarchy for decreased size and power in video and image processing systems , 1995, Records of the 1995 IEEE International Workshop on Memory Technology, Design and Testing.

[15]  Frank Vahid,et al.  Platform Tuning for Embedded Systems Design , 2001, Computer.

[16]  Wayne H. Wolf,et al.  Iterative cache simulation of embedded CPUs with trace stripping , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).

[17]  Paolo Faraboschi,et al.  Custom-fit processors: letting applications define architectures , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[18]  Lea Hwang Lee,et al.  Low-Cost Embedded Program Loop Caching - Revisited , 1999 .

[19]  Ed F. Deprettere,et al.  An Approach for Quantitative Analysis of Application-Specific Dataflow Architectures , 1997, ASAP.

[20]  Luca Benini,et al.  Selective instruction compression for memory energy reduction in embedded systems , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[21]  J. A. Fisher,et al.  Customized instruction-sets for embedded processors , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[22]  Mary Jane Irwin,et al.  Some issues in gray code addressing , 1996, Proceedings of the Sixth Great Lakes Symposium on VLSI.

[23]  Chi-Ying Tsui,et al.  Saving power in the control path of embedded processors , 1994, IEEE Design & Test of Computers.

[24]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[25]  Nikil D. Dutt,et al.  Architectural exploration and optimization of local memory in embedded systems , 1997, Proceedings. Tenth International Symposium on System Synthesis (Cat. No.97TB100114).

[26]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[27]  M. Kandemir,et al.  Power-aware partitioned cache architectures , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[28]  Frank Vahid,et al.  A Study on the Loop Behavior of Embedded Programs , 2002 .

[29]  Frank Vahid,et al.  Improving Software Performance with Configurable Logic , 2002, Des. Autom. Embed. Syst..

[30]  Mircea R. Stan,et al.  Bus-invert coding for low-power I/O , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[31]  S. Abraham,et al.  Eecient Simulation of Multiple Cache Conngurations Using Binomial Trees , 1991 .

[32]  B. Moyer,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[33]  Bill Moyer,et al.  A low power unified cache architecture providing power and performance flexibility , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[34]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.