A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors

Computing-in-memory (CIM) is a promising approach to reduce the latency and improve the energy efficiency of deep neural network (DNN) artificial intelligence (AI) edge processors. However, SRAM-based CIM (SRAM-CIM) faces practical challenges in terms of area overhead, performance, energy efficiency, and yield against variations in data patterns and transistor performance. This paper employed a circuit-system co-design methodology to develop a SRAM-CIM unit-macro for a binary-based fully connected neural network (FCNN) layer of the DNN AI edge processors. The proposed SRAM-CIM unit-macro supports two binarized neural network models: an XNOR neural network (XNORNN) and a modified binary neural network (MBNN). To achieve compact area, fast access time, robust operations, and high energy-efficiency, our proposed SRAM-CIM uses a split-wordline compact-rule 6T SRAM and circuit techniques, including a dynamic input-aware reference generation (DIARG) scheme, an algorithm-dependent asymmetric control (ADAC) scheme, a write disturb-free (WDF) scheme, and a common-mode-insensitive small offset voltage-mode sensing amplifier (CMI-VSA). A fabricated 65-nm 4-Kb SRAM-CIM unit-macro achieved 2.4- and 2.3-ns product-sum access times for a FCNN layer using XNORNN and MBNN, respectively. The measured maximum energy efficiency reached 30.49 TOPS/W for XNORNN and 55.8 TOPS/W for the MBNN modes.

[1]  Anantha Chandrakasan,et al.  Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[2]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[3]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[4]  Youchang Kim,et al.  14.3 A 0.55V 1.1mW artificial-intelligence processor with PVT compensation for micro robots , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[5]  Meng-Fan Chang,et al.  A Compact-Area Low-VDDmin 6T SRAM With Improvement in Cell Stability, Read Speed, and Write Margin Using a Dual-Split-Control-Assist Scheme , 2017, IEEE Journal of Solid-State Circuits.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  David Blaauw,et al.  Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Youchang Kim,et al.  14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[9]  Meng-Fan Chang,et al.  A 3T1R Nonvolatile TCAM Using MLC ReRAM for Frequent-Off Instant-On Filters in IoT and Big-Data Processing , 2017, IEEE Journal of Solid-State Circuits.

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Meng-Fan Chang,et al.  Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks , 2018, Design Automation Conference.

[12]  Meng-Fan Chang,et al.  A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Tadahiro Kuroda,et al.  BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W , 2018, IEEE Journal of Solid-State Circuits.

[15]  Meng-Fan Chang,et al.  A 28nm 32Kb embedded 2T2MTJ STT-MRAM macro with 1.3ns read-access time for fast and reliable read applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[16]  David Blaauw,et al.  A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory , 2016, IEEE Journal of Solid-State Circuits.

[17]  Sujan Kumar Gonugondla,et al.  A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[18]  Kaushik Roy,et al.  Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[19]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[20]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[21]  James R. Glass,et al.  14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[22]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[23]  Meng-Fan Chang,et al.  A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory , 2017, VLSIT 2017.

[24]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[25]  Naresh R. Shanbhag,et al.  An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Meng-Fan Chang,et al.  17.3 A 28nm 256kb 6T-SRAM with 280mV improvement in VMIN using a dual-split-control assist scheme , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[27]  Hoi-Jun Yoo,et al.  14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[28]  Jae-sun Seo,et al.  XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks , 2018, 2018 IEEE Symposium on VLSI Technology.

[29]  D. Blaauw,et al.  A 0 . 3 V VDDmin 4 + 2 T SRAM for Searching and In-Memory Computing Using 55 nm DDC Technology , 2018 .

[30]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[31]  Naveen Verma,et al.  A machine-learning classifier implemented in a standard 6T SRAM array , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[32]  Kaushik Roy,et al.  X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[33]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .