论文信息 - A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors

A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors

Computing-in-memory (CIM) is a promising approach to reduce the latency and improve the energy efficiency of deep neural network (DNN) artificial intelligence (AI) edge processors. However, SRAM-based CIM (SRAM-CIM) faces practical challenges in terms of area overhead, performance, energy efficiency, and yield against variations in data patterns and transistor performance. This paper employed a circuit-system co-design methodology to develop a SRAM-CIM unit-macro for a binary-based fully connected neural network (FCNN) layer of the DNN AI edge processors. The proposed SRAM-CIM unit-macro supports two binarized neural network models: an XNOR neural network (XNORNN) and a modified binary neural network (MBNN). To achieve compact area, fast access time, robust operations, and high energy-efficiency, our proposed SRAM-CIM uses a split-wordline compact-rule 6T SRAM and circuit techniques, including a dynamic input-aware reference generation (DIARG) scheme, an algorithm-dependent asymmetric control (ADAC) scheme, a write disturb-free (WDF) scheme, and a common-mode-insensitive small offset voltage-mode sensing amplifier (CMI-VSA). A fabricated 65-nm 4-Kb SRAM-CIM unit-macro achieved 2.4- and 2.3-ns product-sum access times for a FCNN layer using XNORNN and MBNN, respectively. The measured maximum energy efficiency reached 30.49 TOPS/W for XNORNN and 55.8 TOPS/W for the MBNN modes.

[1] Anantha Chandrakasan,et al. Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[2] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[3] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[4] Youchang Kim,et al. 14.3 A 0.55V 1.1mW artificial-intelligence processor with PVT compensation for micro robots , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[5] Meng-Fan Chang,et al. A Compact-Area Low-VDDmin 6T SRAM With Improvement in Cell Stability, Read Speed, and Write Margin Using a Dual-Split-Control-Assist Scheme , 2017, IEEE Journal of Solid-State Circuits.

[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7] David Blaauw,et al. Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8] Youchang Kim,et al. 14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[9] Meng-Fan Chang,et al. A 3T1R Nonvolatile TCAM Using MLC ReRAM for Frequent-Off Instant-On Filters in IoT and Big-Data Processing , 2017, IEEE Journal of Solid-State Circuits.

[10] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Meng-Fan Chang,et al. Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks , 2018, Design Automation Conference.

[12] Meng-Fan Chang,et al. A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Tadahiro Kuroda,et al. BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W , 2018, IEEE Journal of Solid-State Circuits.

[15] Meng-Fan Chang,et al. A 28nm 32Kb embedded 2T2MTJ STT-MRAM macro with 1.3ns read-access time for fast and reliable read applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[16] David Blaauw,et al. A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory , 2016, IEEE Journal of Solid-State Circuits.

[17] Sujan Kumar Gonugondla,et al. A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[18] Kaushik Roy,et al. Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[19] Paris Smaragdis,et al. Bitwise Neural Networks , 2016, ArXiv.

[20] Yoshua Bengio,et al. Neural Networks with Few Multiplications , 2015, ICLR.

[21] James R. Glass,et al. 14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[22] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[23] Meng-Fan Chang,et al. A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory , 2017, VLSIT 2017.

[24] Zhuo Wang,et al. In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[25] Naresh R. Shanbhag,et al. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26] Meng-Fan Chang,et al. 17.3 A 28nm 256kb 6T-SRAM with 280mV improvement in VMIN using a dual-split-control assist scheme , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[27] Hoi-Jun Yoo,et al. 14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[28] Jae-sun Seo,et al. XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks , 2018, 2018 IEEE Symposium on VLSI Technology.

[29] D. Blaauw,et al. A 0 . 3 V VDDmin 4 + 2 T SRAM for Searching and In-Memory Computing Using 55 nm DDC Technology , 2018 .

[30] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[31] Naveen Verma,et al. A machine-learning classifier implemented in a standard 6T SRAM array , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[32] Kaushik Roy,et al. X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[33] Igor Carron,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .