Memory Design and Exploration for Low Power, Embedded Systems

In this paper, we describe a procedure for memory design and exploration for low power embedded systems. Our system consists of an instruction cache and a data cache on-chip, and a large memory off-chip. In the first step, we try to reduce the power consumption due to memory traffic by applying memory-optimizing transformations such as loop transformations. Next we use a memory exploration procedure to choose a cache configuration (cache size and line size) that satisfies the system requirements of area, number of cycles and energy consumption. We include energy in the performance metrics, since for different cache configurations, the variation in energy consumption is quite different from the variation in the number of cycles. The memory exploration procedure is very efficient since it exploits the trends in the cycles and energy characteristics to reduce the search space significantly.

[1]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[2]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[3]  Miodrag Potkonjak,et al.  Application-driven synthesis of core-based systems , 1997, ICCAD 1997.

[4]  Mahmut T. Kandemir,et al.  The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.

[5]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[6]  Hugo De Man,et al.  Global Communication and Memory Optimizing Transformations for Low Power Systems , 1994 .

[7]  Jörg Henkel,et al.  A framework for estimation and minimizing energy dissipation of embedded HW/SW systems , 1998, DAC.

[8]  Nikil D. Dutt,et al.  Memory data organization for improved cache performance in embedded processor applications , 1997, TODE.

[9]  Nikil D. Dutt,et al.  Architectural exploration and optimization of local memory in embedded systems , 1997, Proceedings. Tenth International Symposium on System Synthesis (Cat. No.97TB100114).

[10]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[11]  Andrew Wolfe,et al.  A methodology to evaluate memory architecture design tradeoffs for video signal processors , 1998, IEEE Trans. Circuits Syst. Video Technol..

[12]  Vivek Sarkar,et al.  A general framework for iteration-reordering loop transformations , 1992, PLDI '92.

[13]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[14]  Jörg Henkel,et al.  Interface and cache power exploration for core-based embedded system design , 1999, ICCAD 1999.

[15]  Luca Benini,et al.  Cycle-accurate simulation of energy consumption in embedded systems , 1999, DAC '99.

[16]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[17]  Mahmut T. Kandemir,et al.  Influence of compiler optimizations on system power , 2000, Proceedings 37th Design Automation Conference.

[18]  Nikil D. Dutt,et al.  Local memory exploration and optimization in embedded systems , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[19]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[20]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[21]  Chaitali Chakrabarti,et al.  Memory exploration for low power, embedded systems , 1999, DAC '99.

[22]  H. De Man,et al.  Global communication and memory optimizing transformations for low power signal processing systems , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[23]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[24]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[25]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..