C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism

This article presents C3SRAM, an in-memory-computing SRAM macro. The macro is an SRAM module with the circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations. The macro utilizes analog-mixed-signal (AMS) capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations. Without the need to access the stored weights by individual row, the macro asserts all its rows simultaneously and forms an analog voltage at the read bitline node through capacitive voltage division. With one analog-to-digital converter (ADC) per column, the macro realizes fully parallel vector–matrix multiplication in a single cycle. The network type that the macro supports and the computing mechanism it utilizes are determined by the robustness and error tolerance necessary in AMS computing. The C3SRAM macro is prototyped in a 65-nm CMOS. It demonstrates an energy efficiency of 672 TOPS/W and a speed of 1638 GOPS (20.2 TOPS/mm2), achieving 3975 $\times $ better energy–delay product than the conventional digital baseline performing the same operation. The macro achieves 98.3% accuracy for MNIST and 85.5% for CIFAR-10, which is among the best in-memory computing works in terms of energy efficiency and inference accuracy tradeoff.

[1]  Meng-Fan Chang,et al.  24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hossein Valavi,et al.  A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement , 2018, 2018 IEEE Symposium on VLSI Circuits.

[4]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[5]  Sachin S. Sapatnekar,et al.  In-Memory Processing on the Spintronic CRAM: From Hardware Design to Application Mapping , 2019, IEEE Transactions on Computers.

[6]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[7]  Meng-Fan Chang,et al.  A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[8]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[9]  Xiaoyang Zeng,et al.  Recursive Binary Neural Network Learning Model with 2.28b/Weight Storage Requirement , 2017, ArXiv.

[10]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[11]  David Blaauw,et al.  Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[12]  Sujan Kumar Gonugondla,et al.  A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[13]  David Blaauw,et al.  14.2 A Compute SRAM with Bit-Serial Integer/Floating-Point Operations for Programmable In-Memory Vector Acceleration , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[14]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Hoi-Jun Yoo,et al.  14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[17]  Shaahin Angizi,et al.  HielM: Highly flexible in-memory computing using STT MRAM , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[18]  Anantha P. Chandrakasan,et al.  CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[19]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[20]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[21]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[22]  Gu-Yeon Wei,et al.  14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[23]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[24]  David Blaauw,et al.  A 0.3V VDDmin 4+2T SRAM for searching and in-memory computing using 55nm DDC technology , 2017, 2017 Symposium on VLSI Circuits.

[25]  Martin Kumm,et al.  Efficient Error-Tolerant Quantized Neural Network Accelerators , 2019, 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  G. W. Burr,et al.  Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element , 2015, 2014 IEEE International Electron Devices Meeting.

[28]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[29]  Mingoo Seok,et al.  C3SRAM: In-Memory-Computing SRAM Macro Based on Capacitive-Coupling Computing , 2019, IEEE Solid-State Circuits Letters.

[30]  Sujan Kumar Gonugondla,et al.  An In-Memory VLSI Architecture for Convolutional Neural Networks , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[31]  Meng-Fan Chang,et al.  A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[32]  Naresh R. Shanbhag,et al.  A 19.4 nJ/decision 364K decisions/s in-memory random forest classifier in 6T SRAM array , 2017, ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference.

[33]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[34]  Jae-sun Seo,et al.  XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks , 2018, 2018 IEEE Symposium on VLSI Technology.

[35]  Gökmen Tayfun,et al.  Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[36]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[37]  R. Jordan,et al.  NVM neuromorphic core with 64k-cell (256-by-256) phase change memory synaptic array with on-chip neuron circuits for continuous in-situ learning , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[38]  Hossein Valavi,et al.  A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute , 2019, IEEE Journal of Solid-State Circuits.

[39]  Tushar Gupta,et al.  Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[40]  An Chen,et al.  Variability of resistive switching memories and its impact on crossbar array performance , 2011, 2011 International Reliability Physics Symposium.

[41]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42]  Sujan Kumar Gonugondla,et al.  A Variation-Tolerant In-Memory Machine Learning Classifier via On-Chip Training , 2018, IEEE Journal of Solid-State Circuits.

[43]  Hongyang Jia,et al.  In-Memory Computing: Advances and prospects , 2019, IEEE Solid-State Circuits Magazine.

[44]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .