Design Exploration of ReRAM-Based Crossbar for AI Inference

ReRAM-based crossbar designs utilizing mixed-signal implementation has gained importance due to their low power, small size, low cost, and high throughput especially for multiply-and-add operations in AI-related applications. This paper provides a framework with associated code for analyzing the impact of ReRAM device variation and post-training analog conductance quantization that has not been fully explored for pre-trained network accelerators. A detailed study with end-to-end implementation is presented, ranging from mapping the pre-trained DNN weights to quantized crossbar conductance values and into final classification with the presence of variation. Monte Carlo analysis was performed to better analyze the impact of the different parameters on the final accuracy without being device-specific. The work assumes different conductance value variations, different conductance dynamic ranges, and various device quantization levels (QLs). MNIST and CIFAR-10 data sets were used in this study for ANN and CNN, respectively. Results show that for simple ANN, the accuracy drop due to quantization was ~ 2% at 64-QLs. While for CNN, the decrease in classification accuracy was around 10% with the same number of levels. Moreover, weight variation might cause a ~ 5% and ~ 8% drop in classification accuracy for ANN with 5% and CNN with 3% variation, respectively. The study confirms that increasing the number of levels with small variation results in near-optimal accuracy. However, the increase in accuracy saturates at an upper limit. The amount of distortion propagated through the layers is different in the two cases. It is dependent on the complexity of input data and network structure, such as the size of the neurons in each layer, the number of layers, and the number of channels and filters at each stage. This contribution is the first to provide the framework to explore the implication that emphasizes on device-independent post-training DNN quantization and weight variation on classification accuracy. This helps explore design trade-offs especially for edge devices in cases where there is no access to the training set to the end-user due to security or cost issues for pre-trained networks.

[1]  Yu Wang,et al.  MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[3]  Xiaochen Peng,et al.  Design Guidelines of RRAM based Neural-Processing-Unit: A Joint Device-Circuit-Algorithm Analysis , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[4]  Kaushik Roy,et al.  GENIEx: A Generalized Approach to Emulating Non-Ideality in Memristive Xbars using Neural Networks , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[5]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[6]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[7]  Mohammad Alhawari,et al.  MemSens: Memristor-Based Radiation Sensor , 2018, IEEE Sensors Journal.

[8]  Hongxia Liu,et al.  A Multi-level Memristor Based on Al-Doped HfO2 Thin Film , 2019, Nanoscale Research Letters.

[9]  R. Quentin Grafton,et al.  truncated normal distribution , 2012 .

[10]  Xiaochen Peng,et al.  NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[11]  N. Christoforou,et al.  State of the art of metal oxide memristor devices , 2016 .

[12]  Yu Wang,et al.  Memristor-based approximated computation , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[13]  He Qian,et al.  Understanding memristive switching via in situ characterization and device modeling , 2019, Nature Communications.

[14]  Yanwei Shen,et al.  Influence of carrier concentration on the resistive switching characteristics of a ZnO-based memristor , 2016, Nano Research.

[15]  Yanzhi Wang,et al.  An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[16]  Yu Wang,et al.  Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication , 2015, Journal of Computer Science and Technology.

[17]  Said F. Al-Sarawi,et al.  ReRAM-Based In-Memory Computing for Search Engine and Neural Network Applications , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[18]  Hyunsang Hwang,et al.  TiOx-Based RRAM Synapse With 64-Levels of Conductance and Symmetric Conductance Change by Adopting a Hybrid Pulse Scheme for Neuromorphic Computing , 2016, IEEE Electron Device Letters.

[19]  Daniel Soudry,et al.  Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.

[20]  Said F. Al-Sarawi,et al.  Memristor-Based Hardware Accelerator for Image Compression , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Georges G. E. Gielen,et al.  The Effects of Quantization on Multi-Layer Feedforward Neural Networks , 2003, Int. J. Pattern Recognit. Artif. Intell..

[22]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[23]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[24]  Yiran Chen,et al.  Memristor Crossbar-Based Neuromorphic Computing System: A Case Study , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Said F. Al-Sarawi,et al.  Stateful Memristor-Based Search Architecture , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[27]  Baker Mohammad,et al.  NeuroMem: Analog Graphene-Based Resistive Memory for Artificial Neural Networks , 2020, Scientific Reports.

[28]  Xiangshui Miao,et al.  Customized binary and multi-level HfO2−x-based memristors tuned by oxidation conditions , 2017, Scientific Reports.

[29]  Xiaochen Peng,et al.  DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies , 2019, 2019 IEEE International Electron Devices Meeting (IEDM).

[30]  Jie Lin,et al.  Noise Injection Adaption: End-to-End ReRAM Crossbar Non-ideal Effect Adaption for Neural Network Mapping , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[31]  David Thorsley,et al.  Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation , 2020, ArXiv.

[32]  Kaushik Roy,et al.  RxNN: A Framework for Evaluating Deep Neural Networks on Resistive Crossbars , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[33]  Meng-Fan Chang,et al.  Challenges and Trends inDeveloping Nonvolatile Memory-Enabled Computing Chips for Intelligent Edge Devices , 2020, IEEE Transactions on Electron Devices.

[34]  Ling Shao,et al.  TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights , 2018, ECCV.

[35]  Luca Benini,et al.  QUENN: QUantization engine for low-power neural networks , 2018, CF.

[36]  Yu Wang,et al.  Exploring the Precision Limitation for RRAM-Based Analog Approximate Computing , 2016, IEEE Design & Test.

[37]  Baker Mohammad,et al.  Modeling and Optimization of Memristor and STT-RAM-Based Memory for Low-Power Applications , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[38]  Baker Mohammad,et al.  Modeling Valance Change Memristor Device: Oxide Thickness, Material Type, and Temperature Effects , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[39]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  He Liu,et al.  A Unified Framework for Training, Mapping and Simulation of ReRAM-Based Convolutional Neural Network Acceleration , 2019, IEEE Computer Architecture Letters.

[41]  E. Eleftheriou,et al.  Memory devices and applications for in-memory computing , 2020, Nature Nanotechnology.

[42]  Zhi Zhou,et al.  Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing , 2019, IEEE Transactions on Wireless Communications.

[43]  Swagath Venkataramani,et al.  Accurate and Efficient 2-bit Quantized Neural Networks , 2019, MLSys.

[44]  Jiaming Zhang,et al.  Analogue signal and image processing with large memristor crossbars , 2017, Nature Electronics.

[45]  V. Erokhin,et al.  Parylene Based Memristive Devices with Multilevel Resistive Switching for Neuromorphic Applications , 2019, Scientific Reports.

[46]  J. Yang,et al.  Memristive crossbar arrays for brain-inspired computing , 2019, Nature Materials.

[47]  Yuangang Wang,et al.  A Highly Parallel and Energy Efficient Three-Dimensional Multilayer CMOS-RRAM Accelerator for Tensorized Neural Network , 2018, IEEE Transactions on Nanotechnology.

[48]  A. Sebastian,et al.  An overview of phase-change memory device physics , 2020, Journal of Physics D: Applied Physics.

[49]  Xiaoyu Sun,et al.  Impact of Non-Ideal Characteristics of Resistive Synaptic Devices on Implementing Convolutional Neural Networks , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.