MLFlash-CIM: Embedded Multi-Level NOR-Flash Cell based Computing in Memory Architecture for Edge AI Devices

Computing-in-Memory (CIM) is a promising method to overcome the well-known “Von Neumann Bottleneck” with computation insides memory, especially in edge artificial intelligence (AI) devices. In this paper, we proposed a 40nm 1Mb Multi-Level NOR-Flash cell based CIM (MLFlash-CIM) architecture with hardware and software co-design. Modeling of proposed MLFlash-CIM was analyzed with the consideration of cell variation, number of activated cells, integral non-linear (INL) and differential non-linear (DNL) of input driver, and quantization error of readout circuits. We also proposed a multi-bit neural network mapping method with 1/n top values and an adaptive quantization scheme to improve the inference accuracy. When applied to a modified VGG-16 Network with 16 layers, the proposed MLFlash-CIM can achieve 92.73% inference accuracy under CIFAR-10 dataset. This CIM structure also achieved a peak throughput of 3.277 TOPS and an energy efficiency of 35.6 TOPS/W for 4-bit multiplication and accumulation (MAC) operations.

[1]  R. Jordan,et al.  NVM neuromorphic core with 64k-cell (256-by-256) phase change memory synaptic array with on-chip neuron circuits for continuous in-situ learning , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[2]  Anantha P. Chandrakasan,et al.  CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Farnood Merrikh-Bayat,et al.  Temperature-insensitive analog vector-by-matrix multiplier based on 55 nm NOR flash memory cells , 2016, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[5]  Yuning Jiang,et al.  Analog Deep Neural Network Based on NOR Flash Computing Array for High Speed/Energy Efficiency Computation , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Xinxin Wang,et al.  A Deep Neural Network Accelerator Based on Tiled RRAM Architecture , 2019, 2019 IEEE International Electron Devices Meeting (IEDM).

[7]  Meng-Fan Chang,et al.  Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[8]  Cheng-Xin Xue,et al.  Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices , 2021, IEEE Transactions on Circuits and Systems I: Regular Papers.

[9]  F. Merrikh Bayat,et al.  Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[10]  Meng-Fan Chang,et al.  A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[11]  Meng-Fan Chang,et al.  24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).