A hybrid precision low power computing-in-memory architecture for neural networks

Abstract Recently, non-volatile memory-based computing-in-memory has been regarded as a promising competitor to ultra-low-power AI chips. Implementations based on both binarized (BIN) and multi-bit (MB) schemes are proposed for DNNs/CNNs. However, there are challenges in accuracy and power efficiency in the practical use of both schemes. This paper proposes a hybrid precision architecture and circuit-level techniques to overcome these challenges. According to measured experimental results, a test chip based on the proposed architecture achieves (1) from binarized weights and inputs up to 8-bit input, 5-bit weight, and 7-bit output, (2) an accuracy loss reduction of from 86% to 96% for multiple complex CNNs, and (3) a power efficiency of 2.15TOPS/W based on a 0.22μm CMOS process which greatly reduces costs compared to digital designs with similar power efficiency. With a more advanced process, the architecture can achieve a higher power efficiency. According to our estimation, a power efficiency of over 20TOPS/W can be achieved with a 55nm CMOS process.

[1]  F. Merrikh Bayat,et al.  Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[2]  Shaahin Angizi,et al.  Energy Efficient In-Memory Binary Deep Neural Network Accelerator with Dual-Mode SOT-MRAM , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[3]  Ian O'Connor,et al.  Computing with ferroelectric FETs: Devices, models, systems, and applications , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[5]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[6]  Meng-Fan Chang,et al.  A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[7]  Yvon Savaria,et al.  Reconfigurable pipelined 2-D convolvers for fast digital signal processing , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Gokhan Memik,et al.  Thermal-aware Optimizations of ReRAM-based Neuromorphic Computing Systems , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shaahin Angizi,et al.  ParaPIM: a parallel processing-in-memory accelerator for binary-weight deep neural networks , 2019, ASP-DAC.

[11]  Farnood Merrikh-Bayat,et al.  High-Performance Mixed-Signal Neurocomputing With Nanoscale Floating-Gate Memory Cell Arrays , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Meng-Fan Chang,et al.  24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[13]  Jing Xia,et al.  DaVinci: A Scalable Architecture for Neural Network Computing , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[14]  Dmitri Strukov,et al.  An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[15]  Hua Fan,et al.  Calibrating Process Variation at System Level with In-Situ Low-Precision Transfer Learning for Analog Neural Network Processors , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[16]  Ryutaro Yasuhara,et al.  A 4M Synapses integrated Analog ReRAM based 66.5 TOPS/W Neural-Network Processor with Cell Current Controlled Writing and Flexible Network Architecture , 2018, 2018 IEEE Symposium on VLSI Technology.

[17]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[18]  Ronald F. DeMara,et al.  ApGAN: Approximate GAN for Robust Low Energy Learning From Imprecise Components , 2020, IEEE Transactions on Computers.

[19]  Anand Raghunathan,et al.  Computing in Memory With Spin-Transfer Torque Magnetic RAM , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[20]  Runze Han,et al.  A Novel Convolution Computing Paradigm Based on NOR Flash Array with High Computing Speed and Energy Efficient , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).