An Energy-Efficient Quantized and Regularized Training Framework For Processing-In-Memory Accelerators

Convolutional Neural Networks (CNNs) have made breakthroughs in various fields, while the energy consumption becomes enormous. Processing-In-Memory (PIM) architectures based on emerging non-volatile memory (e.g., Resistive Random Access Memory, RRAM) have demonstrated great potential in improving the energy efficiency of CNN computing. However, there is still much room for improvement in the energy efficiency of existing PIM architectures. On the one hand, current work shows that high resolution Analog-to-Digital Converters (ADCs) are required for maintaining computing accuracy, but they dominate more than 60% energy consumption of the entire system, damaging the energy efficiency benefits of PIM. On the other hand, the characteristic of computing in the analog domain in PIM accelerators leads to the computing energy consumption is influenced by the specific input and weight values. However, as far as we know, there is no energy efficiency optimization method based on this characteristic in existing work. To solve these problems, in this paper, we propose an energy-efficient quantized and regularized training framework for PIM accelerators, which consists of a PIM-based non-uniform activation quantization scheme and an energy-aware weight regularization method. The proposed framework can improve the energy efficiency of PIM architectures by reducing the ADC resolution requirements and training low energy consumption CNN models for PIM, with little accuracy loss. The experimental results show that the proposed training framework can reduce the resolution of ADCs by 2 bits and the computing energy consumption in the analog domain by 35%. The energy efficiency, therefore, can be enhanced by $3.4 \times$ in our proposed training framework.

[1]  Yu Wang,et al.  Stuck-at Fault Tolerance in RRAM Computing Systems , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[2]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]  Yu Wang,et al.  Low Bit-Width Convolutional Neural Network on RRAM , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[5]  Davood Shahrjerdi,et al.  A 700 μW 1GS/s 4-bit folding-flash ADC in 65nm CMOS for wideband wireless communications , 2016, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[9]  Meng-Fan Chang,et al.  19.4 embedded 1Mb ReRAM in 28nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[10]  Shimeng Yu,et al.  Demonstration of Generative Adversarial Network by Intrinsic Random Noises of Analog RRAM Devices , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[11]  Yu Wang,et al.  Switched by input: Power efficient structure for RRAM-based convolutional neural network , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Shuang Wu,et al.  Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[13]  Michael P. Flynn,et al.  27.3 Area-efficient 1GS/s 6b SAR ADC with charge-injection-cell-based DAC , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[14]  Yu Wang,et al.  MErging the Interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[15]  Song Han,et al.  A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[18]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[19]  Fan Zhang,et al.  A >3GHz ERBW 1.1GS/S 8B Two-Sten SAR ADC with Recursive-Weight DAC , 2018, 2018 IEEE Symposium on VLSI Circuits.