Generalized Water-Filling for Source-Aware Energy-Efficient SRAMs

Conventional low-power static random access memories (SRAMs) reduce read energy by decreasing the bit-line voltage swings uniformly across the bit-line columns. This is because the read energy is proportional to the bit-line swings. On the other hand, bit-line swings are limited by the need to avoid decision errors especially in the most significant bits. We propose a principled approach to determine optimal non-uniform bit-line swings by formulating convex optimization problems. For a given constraint on mean squared error of retrieved words, we consider criteria to minimize energy (for low-power SRAMs), maximize speed (for high-speed SRAMs), and minimize energy-delay product. These optimization problems can be interpreted as classical water-filling, ground-flattening and water-filling, and sand-pouring and water-filling, respectively. By leveraging these interpretations, we also propose greedy algorithms to obtain optimized discrete swings. Numerical results show that energy-optimal swing assignment reduces energy consumption by half at a peak signal-to-noise ratio of 30 dB for an 8-bit accessed word. The energy savings increase to four times for a 16-bit accessed word.

[1]  Ahmed M. Eltawil,et al.  A partial memory protection scheme for higher effective yield of embedded memory for video data , 2008, 2008 13th Asia-Pacific Computer Systems Architecture Conference.

[2]  David Blaauw,et al.  A Sub-200mV 6T SRAM in 0.13μm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[3]  Lizhong Zheng,et al.  Unequal Error Protection: An Information-Theoretic Perspective , 2008, IEEE Transactions on Information Theory.

[4]  Kaushik Roy,et al.  A Priority-Based 6T/8T Hybrid SRAM Architecture for Aggressive Voltage Scaling in Video Applications , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  David Blaauw,et al.  Approximate SRAMs With Dynamic Energy-Quality Management , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Ying Chen,et al.  Characterization of SRAM sense amplifier input offset for yield prediction in 28nm CMOS , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[7]  Jorge Campello De Souza,et al.  Optimal discrete bit loading for multicarrier modulation systems , 1998 .

[8]  Kenichi Osada,et al.  Universal-Vdd 0.65-2.0-V 32-kB cache using a voltage-adapted timing-generation scheme and a lithographically symmetrical cell , 2001, IEEE J. Solid State Circuits.

[9]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[10]  Mohab Anis,et al.  Nanometer Variation-Tolerant SRAM: Circuits and Statistical Design for Yield , 2012 .

[11]  Kaushik Roy,et al.  Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[13]  Jongsun Park,et al.  Heterogeneous SRAM Cell Sizing for Low-Power H.264 Applications , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[15]  Anantha Chandrakasan,et al.  Application-Specific SRAM Design Using Output Prediction to Reduce Bit-Line Switching Activity and Statistically Gated Sense Amplifiers for Up to 1.9$\times$ Lower Energy/Access , 2013, IEEE Journal of Solid-State Circuits.

[16]  F. Amaud A Functional 0.69pm2 Embedded 6T-SRAM bit cell for 65nm CMOS platform , 2003 .

[17]  J. Plusquellic,et al.  A test structure for characterizing local device mismatches , 2006, 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers..

[18]  J. Maiz,et al.  Characterization of multi-bit soft error events in advanced SRAMs , 2003, IEEE International Electron Devices Meeting 2003.

[19]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[20]  Sanu Mathew,et al.  A 1.45GHz 52-to-162GFLOPS/W variable-precision floating-point fused multiply-add unit with certainty tracking in 32nm CMOS , 2012, 2012 IEEE International Solid-State Circuits Conference.

[21]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[22]  Lara Dolecek,et al.  ACOCO: Adaptive Coding for Approximate Computing on Faulty Memories , 2015, IEEE Transactions on Communications.

[23]  Bennett Fox,et al.  Discrete Optimization Via Marginal Analysis , 1966 .

[24]  Jaeha Kim,et al.  Characterization of random decision errors in clocked comparators , 2008, 2008 IEEE Custom Integrated Circuits Conference.

[25]  Antonia Maria Tulino,et al.  Optimum power allocation for parallel Gaussian channels with arbitrary input distributions , 2006, IEEE Transactions on Information Theory.

[26]  J. Campello Practical bit loading for DMT , 1999, 1999 IEEE International Conference on Communications (Cat. No. 99CH36311).

[27]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[28]  Mohab Anis,et al.  Reducing SRAM Power Using Fine-Grained Wordline Pulsewidth Control , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Luca Benini,et al.  Memory design techniques for low energy embedded systems , 2002 .

[30]  K. Ishibashi,et al.  Universal-Vdd 0.65-2.0V 32 kB cache using voltage-adapted timing-generation scheme and a lithographical-symmetric cell , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[31]  Sujan Kumar Gonugondla,et al.  A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.

[32]  Jongsun Park,et al.  Unequal-Error-Protection Error Correction Codes for the Embedded Memories in Digital Signal Processors , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[33]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[34]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[35]  Courtenay T. Vaughan,et al.  Energy Delay Product , 2013 .

[36]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[37]  K. Ishibashi,et al.  16.7 fA/cell tunnel-leakage-suppressed 16 Mb SRAM for handling cosmic-ray-induced multi-errors , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[38]  Jason Schlessman,et al.  Reconfigurable SRAM Architecture With Spatial Voltage Scaling for Low Power Mobile Multimedia Applications , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[39]  Jasbir S. Arora,et al.  Survey of multi-objective optimization methods for engineering , 2004 .

[40]  Liang-Gee Chen,et al.  A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[41]  Naresh R. Shanbhag,et al.  An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Wei Yu,et al.  On constant power water-filling , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[43]  K.J. Kuhn,et al.  Reducing Variation in Advanced Logic Technologies: Approaches to Process and Design for Manufacturability of Nanoscale CMOS , 2007, 2007 IEEE International Electron Devices Meeting.

[44]  David Blaauw,et al.  SRAM for Error-Tolerant Applications With Dynamic Energy-Quality Management in 28 nm CMOS , 2015, IEEE Journal of Solid-State Circuits.

[45]  Jongsun Park,et al.  Priority Based Error Correction Code (ECC) for the Embedded SRAM Memories in H.264 System , 2013, Journal of Signal Processing Systems.

[46]  D. Knighton Fluvial Forms and Processes: A New Perspective , 1998 .

[47]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[48]  Kartik Mohanram,et al.  Unequal-error-protection codes in SRAMs for mobile multimedia applications , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[49]  Jack K. Wolf,et al.  On linear unequal error protection codes , 1967, IEEE Trans. Inf. Theory.

[50]  Xin Li,et al.  Maximum-information storage system: Concept, implementation and application , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[51]  J. Farkas,et al.  A functional 0.69 /spl mu/m/sup 2/ embedded 6T-SRAM bit cell for 65 nm CMOS platform , 2003, 2003 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.03CH37407).

[52]  Krishna V. Palem,et al.  Probabilistic arithmetic and energy efficient embedded signal processing , 2006, CASES '06.

[53]  A. Toriumi,et al.  Experimental study of threshold voltage fluctuation due to statistical variation of channel dopant number in MOSFET's , 1994 .