论文信息 - MLFlash-CIM: Embedded Multi-Level NOR-Flash Cell based Computing in Memory Architecture for Edge AI Devices

MLFlash-CIM: Embedded Multi-Level NOR-Flash Cell based Computing in Memory Architecture for Edge AI Devices

Computing-in-Memory (CIM) is a promising method to overcome the well-known “Von Neumann Bottleneck” with computation insides memory, especially in edge artificial intelligence (AI) devices. In this paper, we proposed a 40nm 1Mb Multi-Level NOR-Flash cell based CIM (MLFlash-CIM) architecture with hardware and software co-design. Modeling of proposed MLFlash-CIM was analyzed with the consideration of cell variation, number of activated cells, integral non-linear (INL) and differential non-linear (DNL) of input driver, and quantization error of readout circuits. We also proposed a multi-bit neural network mapping method with 1/n top values and an adaptive quantization scheme to improve the inference accuracy. When applied to a modified VGG-16 Network with 16 layers, the proposed MLFlash-CIM can achieve 92.73% inference accuracy under CIFAR-10 dataset. This CIM structure also achieved a peak throughput of 3.277 TOPS and an energy efficiency of 35.6 TOPS/W for 4-bit multiplication and accumulation (MAC) operations.

[1] R. Jordan,et al. NVM neuromorphic core with 64k-cell (256-by-256) phase change memory synaptic array with on-chip neuron circuits for continuous in-situ learning , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[2] Anantha P. Chandrakasan,et al. CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Farnood Merrikh-Bayat,et al. Temperature-insensitive analog vector-by-matrix multiplier based on 55 nm NOR flash memory cells , 2016, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[5] Yuning Jiang,et al. Analog Deep Neural Network Based on NOR Flash Computing Array for High Speed/Energy Efficiency Computation , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[6] Xinxin Wang,et al. A Deep Neural Network Accelerator Based on Tiled RRAM Architecture , 2019, 2019 IEEE International Electron Devices Meeting (IEDM).

[7] Meng-Fan Chang,et al. Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[8] Cheng-Xin Xue,et al. Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices , 2021, IEEE Transactions on Circuits and Systems I: Regular Papers.

[9] F. Merrikh Bayat,et al. Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[10] Meng-Fan Chang,et al. A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[11] Meng-Fan Chang,et al. 24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).