Power Estimator Development for Embedded System Memory Tuning

Memory accesses account for a large percentage of total power in microprocessor-based embedded systems. The increasing use of microprocessor cores and synthesis, rather than prefabricated microprocessor chips, creates the opportunity to tune a memory hierarchy to the one program that will execute in the embedded system. Such tuning requires fast and accurate estimation of the power and performance of different memory configurations. We describe a general three-step approach to developing such estimators, based on our experiences on several different projects. Each step is increasingly fast, using the previous step to gauge accuracy. The first step uses high-level functional simulation, the second step uses trace simulation, and the third step uses equations. A tool developer can follow these three steps to create a powerful environment for core users to support synthesis of the best memory hierarchy for a particular embedded system. The approach can be applied to components other than memory also.

[1]  Jörg Henkel,et al.  Code compression for low power embedded system design , 2000, Proceedings 37th Design Automation Conference.

[2]  Joseph A. Fisher Customized instruction-sets for embedded processors , 1999, DAC '99.

[3]  Alexander Chatzigeorgiou,et al.  Memory hierarchy exploration for low power architectures in embedded multimedia applications , 2001, ICIP.

[4]  Jorg Henkel,et al.  System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip , 2001, ICCAD 2001.

[5]  Frank Vahid,et al.  Interface and cache power exploration for core-based embedded system design , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[6]  C. Chakrabarti Cache design and exploration for low power embedded systems , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[7]  Hiroto Yasuura,et al.  A power reduction technique with object code merging for application specific embedded processors , 2000, DATE '00.

[8]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[9]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[10]  Bill Moyer,et al.  A low power unified cache architecture providing power and performance flexibility , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[11]  Jörg Henkel,et al.  Fast cache and bus power estimation for parameterized system-on-a-chip design , 2000, DATE '00.

[12]  Paolo Faraboschi,et al.  Custom-fit processors: letting applications define architectures , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[13]  Lea Hwang Lee,et al.  Low-Cost Embedded Program Loop Caching - Revisited , 1999 .

[14]  Taewhan Kim,et al.  Bus-invert coding for low-power I/O - a decomposition approach , 2000, Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems (Cat.No.CH37144).

[15]  Norman P. Jouppi,et al.  CACTI 2.0: An Integrated Cache Timing and Power Model , 2002 .

[16]  Frank Vahid,et al.  Dynamic loop caching meets preloaded loop caching-a hybrid approach , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[17]  Jun Yang,et al.  Frequent value compression in data caches , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[18]  Luca Benini,et al.  Selective instruction compression for memory energy reduction in embedded systems , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[19]  Jorg Henkel,et al.  A/sup 2/BC: adaptive address bus coding for low power deep sub-micron designs , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[20]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[21]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[22]  Radu Marculescu,et al.  Improving the efficiency of power simulators by input vector compaction , 1996, DAC '96.

[23]  Lea Hwang Lee,et al.  Designing the Low-Power MCORE TM Architecture , 1998 .

[24]  Alvin M. Despain,et al.  Cache designs for energy efficiency , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[25]  Wayne H. Wolf,et al.  Iterative cache simulation of embedded CPUs with trace stripping , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).

[26]  P.P. Gelsinger,et al.  Microprocessors circa 2000 , 1989, IEEE Spectrum.

[27]  Frank Vahid,et al.  Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example , 2002, IEEE Computer Architecture Letters.

[28]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[29]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[30]  Jörg Henkel,et al.  Evaluating power consumption of parameterized cache and bus architectures in system-on-a-chip designs , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[31]  Santosh G. Abraham,et al.  Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.

[32]  Margaret Martonosi Improving Cache Power EÆciency with an Asymmetric Set-Associative Cache , 2001 .

[33]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[34]  Nikil D. Dutt,et al.  Architectural exploration and optimization of local memory in embedded systems , 1997, Proceedings. Tenth International Symposium on System Synthesis (Cat. No.97TB100114).

[35]  Mircea R. Stan,et al.  Bus-invert coding for low-power I/O , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[36]  Mahmut T. Kandemir,et al.  Influence of compiler optimizations on system power , 2000, Proceedings 37th Design Automation Conference.

[37]  B. Ramakrishna Rau,et al.  Efficient design space exploration in PICO , 2000, CASES '00.

[38]  Luca Benini,et al.  Address bus encoding techniques for system-level power optimization , 1998, Proceedings Design, Automation and Test in Europe.

[39]  Jun Yang,et al.  FV encoding for low-power data I/O , 2001, ISLPED '01.

[40]  Mahmut T. Kandemir,et al.  Energy savings through compression in embedded Java environments , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[41]  Nikil Dutt Memory Organization and Exploration for Embedded Systems-on-Silicon , 1997 .

[42]  H. De Man,et al.  Optimization of memory organization and hierarchy for decreased size and power in video and image processing systems , 1995, Records of the 1995 IEEE International Workshop on Memory Technology, Design and Testing.

[43]  Frank Vahid,et al.  Platune: a tuning framework for system-on-a-chip platforms , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[44]  Jörg Henkel,et al.  Trace-driven system-level power evaluation of system-on-a-chip peripheral cores , 2001, ASP-DAC '01.

[45]  Jörg Henkel,et al.  Interface and cache power exploration for core-based embedded system design , 1999, ICCAD 1999.

[46]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[47]  Ibrahim N. Hajj,et al.  Energy and performance improvements in microprocessor design using a loop cache , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[48]  Chaitali Chakrabarti,et al.  Memory Design and Exploration for Low Power, Embedded Systems , 1999 .

[49]  Srilatha Manne,et al.  Power and performance tradeoffs using various caching strategies , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[50]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[51]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .