High-Throughput In-Memory Computing for Binary Deep Neural Networks With Monolithically Integrated RRAM and 90-nm CMOS

Deep neural network (DNN) hardware designs have been bottlenecked by conventional memories, such as SRAM due to density, leakage, and parallel computing challenges. Resistive devices can address the density and volatility issues but have been limited by peripheral circuit integration. In this work, we present a resistive RAM (RRAM)-based in-memory computing (IMC) design, which is fabricated in 90-nm CMOS with monolithic integration of RRAM devices. We integrated a <inline-formula> <tex-math notation="LaTeX">$128\times 64$ </tex-math></inline-formula> RRAM array with CMOS peripheral circuits, including row/column decoders and flash analog-to-digital converters (ADCs), which collectively become a core component for scalable RRAM-based IMC for large DNNs. To maximize IMC parallelism, we assert all 128 wordlines of the RRAM array simultaneously, perform analog computing along the bitlines, and digitize the bitline voltages using ADCs. The resistance distribution of low-resistance states is tightened by an iterative write-verify scheme. Prototype chip measurements demonstrate high binary DNN accuracy of 98.5% for MNIST and 83.5% for CIFAR-10 data sets, with 24 TOPS/W and 158 GOPS. This represents <inline-formula> <tex-math notation="LaTeX">$22.3\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$10.1\times $ </tex-math></inline-formula> improvements in throughput and energy–delay product (EDP), respectively, compared with the state-of-the-art literature, which can enable intelligent functionalities for area-/energy-constrained edge computing devices.

[1]  Yu Cao,et al.  A 1.06 μW smart ECG processor in 65 nm CMOS for real-time biometrie authentication and personal cardiac monitoring , 2017, VLSIC 2017.

[2]  Meng-Fan Chang,et al.  Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors , 2020, IEEE Journal of Solid-State Circuits.

[3]  Anantha Chandrakasan,et al.  Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[4]  Dmitri B. Strukov,et al.  Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits , 2017, Nature Communications.

[5]  Marian Verhelst,et al.  5 ENVISION : A 0 . 26-to-10 TOPS / W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28 nm FDSOI , 2017 .

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[8]  Jae-sun Seo,et al.  XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks , 2018, 2018 IEEE Symposium on VLSI Technology.

[9]  Gökmen Tayfun,et al.  Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[10]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Qing Wu,et al.  Efficient and self-adaptive in-situ learning in multilayer memristor neural networks , 2018, Nature Communications.

[12]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[13]  ChiaHua Ho,et al.  Integrated HfO2-RRAM to achieve highly reliable, greener, faster, cost-effective, and scaled devices , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[14]  Yu Cao,et al.  A 1.06 μW smart ECG processor in 65 nm CMOS for real-time biometrie authentication and personal cardiac monitoring , 2019, 2017 Symposium on VLSI Circuits.

[15]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[16]  Ryutaro Yasuhara,et al.  A 4M Synapses integrated Analog ReRAM based 66.5 TOPS/W Neural-Network Processor with Cell Current Controlled Writing and Flexible Network Architecture , 2018, 2018 IEEE Symposium on VLSI Technology.

[17]  Frederick T. Chen,et al.  Highly scalable hafnium oxide memory with improvements of resistive distribution and read disturb immunity , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[18]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[19]  Ian A. Young,et al.  CMOS Scaling Trends and Beyond , 2017, IEEE Micro.

[20]  C.C. Chen,et al.  A 7nm CMOS platform technology featuring 4th generation FinFET transistors with a 0.027um2 high density 6-T SRAM cell for mobile SoC applications , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[22]  Jason Cong,et al.  Scaling for edge inference of deep neural networks , 2018 .

[23]  Xiaochen Peng,et al.  Fully parallel RRAM synaptic array for implementing binary neural network with (+1, −1) weights and (+1, 0) neurons , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[24]  Shimeng Yu,et al.  Neuro-Inspired Computing With Emerging Nonvolatile Memorys , 2018, Proceedings of the IEEE.

[25]  Alain J. Martin,et al.  ET 2 : a metric for time and energy efficiency of computation , 2002 .

[26]  Meng-Fan Chang,et al.  A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[27]  Yandong Luo,et al.  Monolithically Integrated RRAM- and CMOS-Based In-Memory Computing Optimizations for Efficient Deep Learning , 2019, IEEE Micro.

[28]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[29]  Meng-Fan Chang,et al.  A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  Hossein Valavi,et al.  A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement , 2018, 2018 IEEE Symposium on VLSI Circuits.

[32]  Meng-Fan Chang,et al.  24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[33]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[34]  Yongtae Kim,et al.  Neuromorphic Processors with Memristive Synapses , 2016, ACM J. Emerg. Technol. Comput. Syst..

[35]  Shinhyun Choi,et al.  SiGe epitaxial memory for neuromorphic computing with reproducible high performance based on engineered dislocations , 2018, Nature Materials.

[36]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  H.-S. Philip Wong,et al.  In-memory computing with resistive switching devices , 2018, Nature Electronics.

[38]  David Blaauw,et al.  14.2 A Compute SRAM with Bit-Serial Integer/Floating-Point Operations for Programmable In-Memory Vector Acceleration , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[39]  Shimeng Yu,et al.  Peripheral Circuit Design Considerations of Neuro-inspired Architectures , 2017 .

[40]  Baker Mohammad,et al.  Comparative study of current mode and voltage mode sense amplifier used for 28nm SRAM , 2012, 2012 24th International Conference on Microelectronics (ICM).

[41]  François W. Primeau,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014 .

[42]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[43]  Kwang Young Kim,et al.  A Low Power 6-bit Flash ADC With Reference Voltage and Common-Mode Calibration , 2008, IEEE Journal of Solid-State Circuits.

[44]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[45]  Jiaming Zhang,et al.  Analogue signal and image processing with large memristor crossbars , 2017, Nature Electronics.

[46]  Xiaochen Peng,et al.  NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[47]  Yiran Chen,et al.  ReRAM-based accelerator for deep learning , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Xiaochen Peng,et al.  XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[50]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[51]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.