Energy-aware design of embedded memories: A survey of technologies, architectures, and optimization techniques

Embedded systems are often designed under stringent energy consumption budgets, to limit heat generation and battery size. Since memory systems consume a significant amount of energy to store and to forward data, it is then imperative to balance power consumption and performance in memory system design. Contemporary system design focuses on the trade-off between performance and energy consumption in processing and storage units, as well as in their interconnections. Although memory design is as important as processor design in achieving the desired design objectives, the former topic has received less attention than the latter in the literature. This article centers on one of the most outstanding problems in chip design for embedded applications. It guides the reader through different memory technologies and architectures, and it reviews the most successful strategies for optimizing them in the power/performance plane.

[1]  K. Itoh Trends in megabit DRAM circuit design , 1990 .

[2]  Srilatha Manne,et al.  Power and performance tradeoffs using various caching strategies , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[3]  S. Clerc,et al.  High flexibility CMOS SRAM generator using multiplan architecture , 1999, Twelfth Annual IEEE International ASIC/SOC Conference (Cat. No.99TH8454).

[4]  Michele Borgatti,et al.  A 64-min single-chip voice recorder/player using embedded 4-b/cell flash memory , 2001 .

[5]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration , 1998 .

[6]  Rochit Rajsuman Design and Test of Large Embedded Memories: An Overview , 2001, IEEE Des. Test Comput..

[7]  Chaitali Chakrabarti,et al.  Memory exploration for low power, embedded systems , 1999, DAC '99.

[8]  Majid Sarrafzadeh,et al.  Memory Segmentation to Exploit Sleep Mode Operation , 1995, 32nd Design Automation Conference.

[9]  Rajesh K. Gupta,et al.  Power savings in embedded processors through decode filter cache , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[10]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.

[11]  H. H. Chao,et al.  Half-V/SUB DD/ bit-line sensing scheme in CMOS DRAMs , 1984 .

[12]  Gary S. Tyson,et al.  Region-based caching: an energy-delay efficient memory architecture for embedded processors , 2000, CASES '00.

[13]  P. K. Lala Self-Checking and Fault-Tolerant Digital Design , 1995 .

[14]  M.A. Horowitz,et al.  Speed and power scaling of SRAM's , 2000, IEEE Journal of Solid-State Circuits.

[15]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[16]  Donald E. Thomas,et al.  Memory modeling for system synthesis , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[17]  T. Takayanagi,et al.  A 60-MHz 240-mW MPEG-4 videophone LSI with 16-Mb embedded DRAM , 2000, IEEE Journal of Solid-State Circuits.

[18]  Yervant Zorian Yield improvement and repair trade-off for large embedded memories , 2000, DATE '00.

[19]  Steffen Paul,et al.  Memory built-in self-repair using redundant words , 2001, Proceedings International Test Conference 2001 (Cat. No.01CH37260).

[20]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[21]  Bharadwaj Amrutur,et al.  A replica technique for wordline and sense control in low-power SRAM's , 1998, IEEE J. Solid State Circuits.

[22]  Masashi Horiguchi,et al.  Dual-regulator dual-decoding-trimmer DRAM voltage limiter for burn-in test , 1991 .

[23]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[24]  Hidemi Takasu,et al.  Ferroelectric memories and their applications , 2001 .

[25]  Carla Golla,et al.  Flash Memories , 1999 .

[26]  Masato Nagamatsu,et al.  A microprocessor with a 128-bit CPU, ten floating-point MAC's, four floating-point dividers, and an MPEG-2 decoder , 1999, IEEE J. Solid State Circuits.

[27]  Kimiyoshi Usami,et al.  Low-power technique for on-chip memory using biased partitioning and access concentration , 2000, Proceedings of the IEEE 2000 Custom Integrated Circuits Conference (Cat. No.00CH37044).

[28]  James R. Goodman,et al.  Limited bandwidth to affect processor design , 1997, IEEE Micro.

[29]  Tomás Lang,et al.  Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[30]  Kaushik Roy,et al.  Reducing leakage in a high-performance deep-submicron instruction cache , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[31]  Luca Benini,et al.  Memory design techniques for low energy embedded systems , 2002 .

[32]  Hiroto Yasuura,et al.  A power reduction technique with object code merging for application specific embedded processors , 2000, DATE '00.

[33]  Luca Benini,et al.  Selective instruction compression for memory energy reduction in embedded systems , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[34]  Nikil D. Dutt,et al.  On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems , 2000, TODE.

[35]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[36]  Nikil D. Dutt,et al.  Access pattern based local memory customization for low power embedded systems , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[37]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[38]  Norbert Wehn,et al.  Embedded DRAM Development: Technology, Physical Design, and Application Issues , 2001, IEEE Des. Test Comput..

[39]  Miroslaw Malek,et al.  Fault-Tolerant Semiconductor Memories , 1984, Computer.

[40]  J. Bu,et al.  On the go with SONOS , 2000 .

[41]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[42]  T. Masuhara,et al.  A 20ns 64K CMOS SRAM , 1984, 1984 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[43]  Takao Onoye,et al.  An object code compression approach to embedded processors , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[44]  Kazumasa Yanagisawa,et al.  Low-Power and High-Speed Advantages of DRAM-Logic Integration for Multimedia Systems (Special Issue on Low-Power and High-Speed LSI Technologies) , 1997 .

[45]  K. Ohmori,et al.  A 60 MHz 240 mW MPEG-4 video-phone LSI with 16 Mb embedded DRAM , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[46]  Masaaki Oka,et al.  Vector Unit Architecture for Emotion Synthesis , 2000, IEEE Micro.

[47]  Masashi Horiguchi,et al.  Stabilization of Voltage Limiter Circuit for High‐Density DRAMs Using Miller Compensation , 1993 .

[48]  Jörg Henkel,et al.  Code compression for low power embedded system design , 2000, Proceedings 37th Design Automation Conference.

[49]  Betty Prince,et al.  SEMICONDUCTOR MEMORIES , 2006 .

[50]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[51]  Y. Nakagome,et al.  Trends in low-power RAM circuit technologies , 1995 .

[52]  T. Daud,et al.  Overview of radiation tolerant unlimited write cycle non-volatile memory , 2000, 2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484).

[53]  Tadahiro Kuroda,et al.  A bitline leakage compensation scheme for low-voltage SRAMs , 2001, IEEE J. Solid State Circuits.

[54]  S. Hanamura,et al.  A 15-ns 1-Mbit CMOS SRAM , 1988 .

[55]  R. Iris Bahar,et al.  The non-critical buffer: using load latency tolerance to improve data cache efficiency , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[56]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[57]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[58]  Massoud Pedram,et al.  Power Aware Design Methodologies , 2002 .

[59]  Francky Catthoor,et al.  Random-access data storage components in customized architectures , 2001, IEEE Design & Test of Computers.

[60]  Donald E. Thomas,et al.  Memory modeling for system synthesis , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[61]  Stephen J. Walsh,et al.  Pollution control caching , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[62]  Lizy Kurian John,et al.  Design and performance evaluation of a cache assist to implement selective caching , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[63]  Amy Hsiu-Fen Chou,et al.  Flash Memories , 2000, The VLSI Handbook.

[64]  Uming Ko,et al.  Energy optimization of multilevel cache architectures for RISC and CISC processors , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[65]  James D. Plummer,et al.  A novel high density, low voltage SRAM cell with a vertical NDR device , 1998, 1998 Symposium on VLSI Technology Digest of Technical Papers (Cat. No.98CH36216).

[66]  Ron Ho,et al.  Low-power SRAM design using half-swing pulse-mode techniques , 1998, IEEE J. Solid State Circuits.

[67]  James R. Goodman,et al.  Hardware techniques to improve the performance of the processor/memory interface , 1998 .

[68]  Kiyoo Itoh,et al.  Power Reduction Techniques in Megabit DRAM's , 1986 .

[69]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[70]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[71]  K. Itoh Trends in megabit DRAM circuit design , 1989, International Symposium on VLSI Technology, Systems and Applications,.

[72]  Raminder Singh Bajwa,et al.  Instruction buffering to reduce power in processors for signal processing , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[73]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip , 1999 .

[74]  Kurt Keutzer,et al.  Code density optimization for embedded DSP processors using data compression techniques , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[75]  Yuan Taur,et al.  Device scaling limits of Si MOSFETs and their application dependencies , 2001, Proc. IEEE.