论文信息 - Generalized Water-Filling for Source-Aware Energy-Efficient SRAMs

Generalized Water-Filling for Source-Aware Energy-Efficient SRAMs

Conventional low-power static random access memories (SRAMs) reduce read energy by decreasing the bit-line voltage swings uniformly across the bit-line columns. This is because the read energy is proportional to the bit-line swings. On the other hand, bit-line swings are limited by the need to avoid decision errors especially in the most significant bits. We propose a principled approach to determine optimal non-uniform bit-line swings by formulating convex optimization problems. For a given constraint on mean squared error of retrieved words, we consider criteria to minimize energy (for low-power SRAMs), maximize speed (for high-speed SRAMs), and minimize energy-delay product. These optimization problems can be interpreted as classical water-filling, ground-flattening and water-filling, and sand-pouring and water-filling, respectively. By leveraging these interpretations, we also propose greedy algorithms to obtain optimized discrete swings. Numerical results show that energy-optimal swing assignment reduces energy consumption by half at a peak signal-to-noise ratio of 30 dB for an 8-bit accessed word. The energy savings increase to four times for a 16-bit accessed word.

[1] Ahmed M. Eltawil,et al. A partial memory protection scheme for higher effective yield of embedded memory for video data , 2008, 2008 13th Asia-Pacific Computer Systems Architecture Conference.

[2] David Blaauw,et al. A Sub-200mV 6T SRAM in 0.13μm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[3] Lizhong Zheng,et al. Unequal Error Protection: An Information-Theoretic Perspective , 2008, IEEE Transactions on Information Theory.

[4] Kaushik Roy,et al. A Priority-Based 6T/8T Hybrid SRAM Architecture for Aggressive Voltage Scaling in Video Applications , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[5] David Blaauw,et al. Approximate SRAMs With Dynamic Energy-Quality Management , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6] Ying Chen,et al. Characterization of SRAM sense amplifier input offset for yield prediction in 28nm CMOS , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[7] Jorge Campello De Souza,et al. Optimal discrete bit loading for multicarrier modulation systems , 1998 .

[8] Kenichi Osada,et al. Universal-Vdd 0.65-2.0-V 32-kB cache using a voltage-adapted timing-generation scheme and a lithographically symmetrical cell , 2001, IEEE J. Solid State Circuits.

[9] Mark Horowitz,et al. Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[10] Mohab Anis,et al. Nanometer Variation-Tolerant SRAM: Circuits and Statistical Design for Yield , 2012 .

[11] Kaushik Roy,et al. Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12] C.E. Shannon,et al. Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[13] Jongsun Park,et al. Heterogeneous SRAM Cell Sizing for Low-Power H.264 Applications , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[15] Anantha Chandrakasan,et al. Application-Specific SRAM Design Using Output Prediction to Reduce Bit-Line Switching Activity and Statistically Gated Sense Amplifiers for Up to 1.9$\times$ Lower Energy/Access , 2013, IEEE Journal of Solid-State Circuits.

[16] F. Amaud. A Functional 0.69pm2 Embedded 6T-SRAM bit cell for 65nm CMOS platform , 2003 .

[17] J. Plusquellic,et al. A test structure for characterizing local device mismatches , 2006, 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers..

[18] J. Maiz,et al. Characterization of multi-bit soft error events in advanced SRAMs , 2003, IEEE International Electron Devices Meeting 2003.

[19] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[20] Sanu Mathew,et al. A 1.45GHz 52-to-162GFLOPS/W variable-precision floating-point fused multiply-add unit with certainty tracking in 32nm CMOS , 2012, 2012 IEEE International Solid-State Circuits Conference.

[21] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[22] Lara Dolecek,et al. ACOCO: Adaptive Coding for Approximate Computing on Faulty Memories , 2015, IEEE Transactions on Communications.

[23] Bennett Fox,et al. Discrete Optimization Via Marginal Analysis , 1966 .

[24] Jaeha Kim,et al. Characterization of random decision errors in clocked comparators , 2008, 2008 IEEE Custom Integrated Circuits Conference.

[25] Antonia Maria Tulino,et al. Optimum power allocation for parallel Gaussian channels with arbitrary input distributions , 2006, IEEE Transactions on Information Theory.

[26] J. Campello. Practical bit loading for DMT , 1999, 1999 IEEE International Conference on Communications (Cat. No. 99CH36311).

[27] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[28] Mohab Anis,et al. Reducing SRAM Power Using Fine-Grained Wordline Pulsewidth Control , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29] Luca Benini,et al. Memory design techniques for low energy embedded systems , 2002 .

[30] K. Ishibashi,et al. Universal-Vdd 0.65-2.0V 32 kB cache using voltage-adapted timing-generation scheme and a lithographical-symmetric cell , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[31] Sujan Kumar Gonugondla,et al. A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.

[32] Jongsun Park,et al. Unequal-Error-Protection Error Correction Codes for the Embedded Memories in Digital Signal Processors , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[33] Manish Gupta,et al. Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[34] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[35] Courtenay T. Vaughan,et al. Energy Delay Product , 2013 .

[36] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[37] K. Ishibashi,et al. 16.7 fA/cell tunnel-leakage-suppressed 16 Mb SRAM for handling cosmic-ray-induced multi-errors , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[38] Jason Schlessman,et al. Reconfigurable SRAM Architecture With Spatial Voltage Scaling for Low Power Mobile Multimedia Applications , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[39] Jasbir S. Arora,et al. Survey of multi-objective optimization methods for engineering , 2004 .

[40] Liang-Gee Chen,et al. A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[41] Naresh R. Shanbhag,et al. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42] Wei Yu,et al. On constant power water-filling , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[43] K.J. Kuhn,et al. Reducing Variation in Advanced Logic Technologies: Approaches to Process and Design for Manufacturability of Nanoscale CMOS , 2007, 2007 IEEE International Electron Devices Meeting.

[44] David Blaauw,et al. SRAM for Error-Tolerant Applications With Dynamic Energy-Quality Management in 28 nm CMOS , 2015, IEEE Journal of Solid-State Circuits.

[45] Jongsun Park,et al. Priority Based Error Correction Code (ECC) for the Embedded SRAM Memories in H.264 System , 2013, Journal of Signal Processing Systems.

[46] D. Knighton. Fluvial Forms and Processes: A New Perspective , 1998 .

[47] M. Horowitz,et al. Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[48] Kartik Mohanram,et al. Unequal-error-protection codes in SRAMs for mobile multimedia applications , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[49] Jack K. Wolf,et al. On linear unequal error protection codes , 1967, IEEE Trans. Inf. Theory.

[50] Xin Li,et al. Maximum-information storage system: Concept, implementation and application , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[51] J. Farkas,et al. A functional 0.69 /spl mu/m/sup 2/ embedded 6T-SRAM bit cell for 65 nm CMOS platform , 2003, 2003 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.03CH37407).

[52] Krishna V. Palem,et al. Probabilistic arithmetic and energy efficient embedded signal processing , 2006, CASES '06.

[53] A. Toriumi,et al. Experimental study of threshold voltage fluctuation due to statistical variation of channel dopant number in MOSFET's , 1994 .