论文信息 - MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

We propose a co-design approach for <italic>compute-in-memory</italic> inference for deep neural networks (DNN). We use multiplication-free function approximators based on <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula> norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current <italic>art</italic> of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn’t require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our <inline-formula> <tex-math notation="LaTeX">$8\times 62$ </tex-math></inline-formula> SRAM macro, which requires a 5-bit ADC, achieves ~105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. Our <inline-formula> <tex-math notation="LaTeX">$8\times 30$ </tex-math></inline-formula> SRAM macro, which requires a 4-bit ADC, achieves ~84 TOPS/W. SRAM macros that require lower ADC precision are more tolerant of process variability, however, have lower TOPS/W as well. We evaluated the accuracy and performance of our proposed network for MNIST, CIFAR10, and CIFAR100 datasets. We chose a network configuration which adaptively mixes multiplication-free and regular operators. The network configurations utilize the multiplication-free operator for more than 85% operations from the total. The selected configurations are 98.6% accurate for MNIST, 90.2% for CIFAR10, and 66.9% for CIFAR100. Since most of the operations in the considered configurations are based on proposed SRAM macros, our compute-in-memory’s efficiency benefits broadly translate to the system-level.

[1] Xi Zhang,et al. Additive neural network for forest fire detection , 2020, Signal Image Video Process..

[2] Liangpei Zhang,et al. Boosting the Accuracy of Multispectral Image Pansharpening by Learning a Deep Residual Network , 2017, IEEE Geoscience and Remote Sensing Letters.

[3] A. Enis Çetin,et al. Non-Euclidean Vector Product for Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Jonathan Chang,et al. 15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[5] Jaydeep P. Kulkarni,et al. A 12.08-TOPS/W All-Digital Time-Domain CNN Engine Using Bi-Directional Memory Delay Lines for Energy Efficient Edge Computing , 2020, IEEE Journal of Solid-State Circuits.

[6] Abhishek Verma,et al. Residual Squeeze VGG16 , 2017, ArXiv.

[7] SS Teja Nibhanupudi,et al. COMPAC: Compressed Time-Domain, Pooling-Aware Convolution CNN Engine With Reduced Data Movement for Energy-Efficient AI Computing , 2020, IEEE Journal of Solid-State Circuits.

[8] Yosuke Toyama,et al. An 8 Bit 12.4 TOPS/W Phase-Domain MAC Circuit for Energy-Constrained Deep Learning Accelerators , 2019, IEEE Journal of Solid-State Circuits.

[9] Khaled N. Salama,et al. Matching Properties of Femtofarad and Sub-Femtofarad MOM Capacitors , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[10] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[11] Zhuo Wang,et al. In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[12] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13] Jae-sun Seo,et al. C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism , 2020, IEEE Journal of Solid-State Circuits.

[14] Shamma Nasrin,et al. Supported-BinaryNet: Bitcell Array-based Weight Supports for Dynamic Accuracy-Latency Trade-offs in SRAM-based Binarized Neural Network , 2019, ArXiv.

[15] Shih-Chieh Chang,et al. 15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[16] Jae-sun Seo,et al. XNOR-SRAM: In-Bitcell Computing SRAM Macro based on Resistive Computing Mechanism , 2019, ACM Great Lakes Symposium on VLSI.

[17] Tiago Carvalho,et al. Exposing Computer Generated Images by Eye’s Region Classification via Transfer Learning of VGG19 CNN , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19] Aysegul Uner,et al. Multiplication-free Neural Networks , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).

[20] Suleyman Serdar Kozat,et al. Energy-Efficient LSTM Networks for Online Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[21] Boris Murmann,et al. Mismatch characterization of small metal fringe capacitors , 2013, Proceedings of the IEEE 2013 Custom Integrated Circuits Conference.

[22] Meng-Fan Chang,et al. 14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[23] Nikolaos Doulamis,et al. Deep supervised learning for hyperspectral data classification through convolutional neural networks , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[24] Paul D. Franzon,et al. FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[25] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[26] Arijit Raychowdhury,et al. A 55-nm, 1.0–0.4V, 1.25-pJ/MAC Time-Domain Mixed-Signal Neuromorphic Accelerator With Stochastic Synapses for Reinforcement Learning in Autonomous Mobile Robots , 2019, IEEE Journal of Solid-State Circuits.

[27] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[28] Anantha P. Chandrakasan,et al. CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[29] Chang Huang,et al. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding , 2015, ArXiv.

[30] Sujan Kumar Gonugondla,et al. An In-Memory VLSI Architecture for Convolutional Neural Networks , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[31] Synho Do,et al. How much data is needed to train a medical image deep learning system to achieve necessary high accuracy , 2015, 1511.06348.

[32] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Yu Cao,et al. New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[34] Naresh R. Shanbhag,et al. The Deep In-Memory Architecture (DIMA) , 2020 .