10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors

Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm2 throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by $16\times $ compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing.

[1]  Meng-Fan Chang,et al.  A 28-nm 320-Kb TCAM Macro Using Split-Controlled Single-Load 14T Cell and Triple-Margin Voltage Sense Amplifier , 2019, IEEE Journal of Solid-State Circuits.

[2]  Marian Verhelst,et al.  An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS , 2017, IEEE Journal of Solid-State Circuits.

[3]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jongsun Park,et al.  Low Cost Convolutional Neural Network Accelerator Based on Bi-Directional Filtering and Bit-Width Reduction , 2018, IEEE Access.

[5]  Jie Gu,et al.  15.3 A 65nm 3T Dynamic Analog RAM-Based Computing-in-Memory Macro and CNN Accelerator with Retention Enhancement, Adaptive Analog Sparsity and 44TOPS/W System Energy Efficiency , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[6]  Seong-Ook Jung,et al.  CNN Acceleration With Hardware-Efficient Dataflow for Super-Resolution , 2020, IEEE Access.

[7]  Hongyang Jia,et al.  A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[8]  Chung-Chuan Lo,et al.  16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[9]  Gu-Yeon Wei,et al.  14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[10]  Meng-Fan Chang,et al.  A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors , 2020, IEEE Journal of Solid-State Circuits.

[11]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[12]  Zhiwei Fang,et al.  Deep Physical Informed Neural Networks for Metamaterial Design , 2020, IEEE Access.

[13]  Indranil Chakraborty,et al.  In-Memory Computing in Emerging Memory Technologies for Machine Learning: An Overview , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[14]  Kaushik Roy,et al.  Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[15]  Sujan Kumar Gonugondla,et al.  A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.

[16]  Nan Sun,et al.  A 2.75-to-75.9TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[18]  Tao Xie,et al.  A Power-Efficient Optimizing Framework FPGA Accelerator Based on Winograd for YOLO , 2020, IEEE Access.

[19]  Mahmut E. Sinangil,et al.  A 7-nm Compute-in-Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS , 2021, IEEE Journal of Solid-State Circuits.

[20]  David Blaauw,et al.  A 28-nm Compute SRAM With Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing , 2020, IEEE Journal of Solid-State Circuits.

[21]  Meng-Fan Chang,et al.  A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors , 2020, IEEE Journal of Solid-State Circuits.

[22]  Meng-Fan Chang,et al.  A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[23]  Anantha P. Chandrakasan,et al.  CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[24]  Xi Chen,et al.  A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS , 2019, 2019 Symposium on VLSI Circuits.

[25]  Jae-sun Seo,et al.  C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism , 2020, IEEE Journal of Solid-State Circuits.

[26]  Jae-sun Seo,et al.  XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks , 2018, 2018 IEEE Symposium on VLSI Technology.

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  John P. Uyemura Introduction to VLSI Circuits and Systems , 2001 .

[29]  Zongfu Yu,et al.  Training Deep Neural Networks for the Inverse Design of Nanophotonic Structures , 2017, 2019 Conference on Lasers and Electro-Optics (CLEO).

[30]  Rui-Sheng Jia,et al.  Face Detection Method Based on Cascaded Convolutional Networks , 2019, IEEE Access.

[31]  N. Vallepalli,et al.  A 3-GHz 70-mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply , 2005, IEEE Journal of Solid-State Circuits.

[32]  Yufeng Li,et al.  A Novel Software-Defined Convolutional Neural Networks Accelerator , 2019, IEEE Access.

[33]  Khaled Shaalan,et al.  Speech Recognition Using Deep Neural Networks: A Systematic Review , 2019, IEEE Access.

[34]  Amir Hussain,et al.  Applications of Deep Learning and Reinforcement Learning to Biological Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Syed Rafay Hasan,et al.  MulNet: A Flexible CNN Processor With Higher Resource Utilization Efficiency for Constrained Devices , 2019, IEEE Access.