MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

We propose a co-design approach for <italic>compute-in-memory</italic> inference for deep neural networks (DNN). We use multiplication-free function approximators based on <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula> norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current <italic>art</italic> of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn’t require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our <inline-formula> <tex-math notation="LaTeX">$8\times 62$ </tex-math></inline-formula> SRAM macro, which requires a 5-bit ADC, achieves ~105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. Our <inline-formula> <tex-math notation="LaTeX">$8\times 30$ </tex-math></inline-formula> SRAM macro, which requires a 4-bit ADC, achieves ~84 TOPS/W. SRAM macros that require lower ADC precision are more tolerant of process variability, however, have lower TOPS/W as well. We evaluated the accuracy and performance of our proposed network for MNIST, CIFAR10, and CIFAR100 datasets. We chose a network configuration which adaptively mixes multiplication-free and regular operators. The network configurations utilize the multiplication-free operator for more than 85% operations from the total. The selected configurations are 98.6% accurate for MNIST, 90.2% for CIFAR10, and 66.9% for CIFAR100. Since most of the operations in the considered configurations are based on proposed SRAM macros, our compute-in-memory’s efficiency benefits broadly translate to the system-level.

[1]  Xi Zhang,et al.  Additive neural network for forest fire detection , 2020, Signal Image Video Process..

[2]  Liangpei Zhang,et al.  Boosting the Accuracy of Multispectral Image Pansharpening by Learning a Deep Residual Network , 2017, IEEE Geoscience and Remote Sensing Letters.

[3]  A. Enis Çetin,et al.  Non-Euclidean Vector Product for Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Jonathan Chang,et al.  15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[5]  Jaydeep P. Kulkarni,et al.  A 12.08-TOPS/W All-Digital Time-Domain CNN Engine Using Bi-Directional Memory Delay Lines for Energy Efficient Edge Computing , 2020, IEEE Journal of Solid-State Circuits.

[6]  Abhishek Verma,et al.  Residual Squeeze VGG16 , 2017, ArXiv.

[7]  SS Teja Nibhanupudi,et al.  COMPAC: Compressed Time-Domain, Pooling-Aware Convolution CNN Engine With Reduced Data Movement for Energy-Efficient AI Computing , 2020, IEEE Journal of Solid-State Circuits.

[8]  Yosuke Toyama,et al.  An 8 Bit 12.4 TOPS/W Phase-Domain MAC Circuit for Energy-Constrained Deep Learning Accelerators , 2019, IEEE Journal of Solid-State Circuits.

[9]  Khaled N. Salama,et al.  Matching Properties of Femtofarad and Sub-Femtofarad MOM Capacitors , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[10]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[11]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[12]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13]  Jae-sun Seo,et al.  C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism , 2020, IEEE Journal of Solid-State Circuits.

[14]  Shamma Nasrin,et al.  Supported-BinaryNet: Bitcell Array-based Weight Supports for Dynamic Accuracy-Latency Trade-offs in SRAM-based Binarized Neural Network , 2019, ArXiv.

[15]  Shih-Chieh Chang,et al.  15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[16]  Jae-sun Seo,et al.  XNOR-SRAM: In-Bitcell Computing SRAM Macro based on Resistive Computing Mechanism , 2019, ACM Great Lakes Symposium on VLSI.

[17]  Tiago Carvalho,et al.  Exposing Computer Generated Images by Eye’s Region Classification via Transfer Learning of VGG19 CNN , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Aysegul Uner,et al.  Multiplication-free Neural Networks , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).

[20]  Suleyman Serdar Kozat,et al.  Energy-Efficient LSTM Networks for Online Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Boris Murmann,et al.  Mismatch characterization of small metal fringe capacitors , 2013, Proceedings of the IEEE 2013 Custom Integrated Circuits Conference.

[22]  Meng-Fan Chang,et al.  14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[23]  Nikolaos Doulamis,et al.  Deep supervised learning for hyperspectral data classification through convolutional neural networks , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[24]  Paul D. Franzon,et al.  FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[25]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[26]  Arijit Raychowdhury,et al.  A 55-nm, 1.0–0.4V, 1.25-pJ/MAC Time-Domain Mixed-Signal Neuromorphic Accelerator With Stochastic Synapses for Reinforcement Learning in Autonomous Mobile Robots , 2019, IEEE Journal of Solid-State Circuits.

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  Anantha P. Chandrakasan,et al.  CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[29]  Chang Huang,et al.  Targeting Ultimate Accuracy: Face Recognition via Deep Embedding , 2015, ArXiv.

[30]  Sujan Kumar Gonugondla,et al.  An In-Memory VLSI Architecture for Convolutional Neural Networks , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[31]  Synho Do,et al.  How much data is needed to train a medical image deep learning system to achieve necessary high accuracy , 2015, 1511.06348.

[32]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Yu Cao,et al.  New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[34]  Naresh R. Shanbhag,et al.  The Deep In-Memory Architecture (DIMA) , 2020 .