Neural Network Quantization with Scale-Adjusted Training

Quantization has long been studied as a compression and accelerating technique for deep neural networks due to its potential on reducing model size and computational costs, for both general hardware, such as DSP, CPU or GPU, and customized devices with flexible bit-width configurations, including FPGA and ASIC. However, previous works generally achieve network quantization by sacrificing on prediction accuracy with respect to their full-precision counterparts. In this paper, we investigate the underlying mechanism of such performance degeneration based on previous work of parameterized clipping activation (PACT). We find that the key factor is the weight scale in the last layer. Instead of aligning weight distributions of quantized and full-precision models, as generally suggested in the literature, the main issue is that large scale can cause overfitting problem. We propose a technique called scale-adjusted training (SAT) by directly scaling down weights in the last layer to alleviate such over-fitting. With the proposed technique, quantized networks can demonstrate better performance than their full-precision counter-parts, and we achieve state-of-the-art accuracy with consistent improvement over previous quantization methods for light weight models including MobileNet V1/V2 on ImageNet classification.

[1]  G. Hua,et al.  LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.

[2]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  David D. Cox,et al.  Minnorm training: an algorithm for training over-parameterized deep neural networks , 2018, ArXiv.

[5]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[6]  Lei Yue,et al.  Attentional Alignment Networks , 2018, BMVC.

[7]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[8]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[10]  Khaled Shaalan,et al.  Speech Recognition Using Deep Neural Networks: A Systematic Review , 2019, IEEE Access.

[11]  Vassilis Athitsos,et al.  lambda-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Xianglong Liu,et al.  Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[14]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[15]  Yingwei Li,et al.  Neural Architecture Search for Lightweight Non-Local Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Steven K. Esser,et al.  Learned Step Size Quantization , 2019, ICLR.

[17]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[19]  Yingwei Li,et al.  CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Network , 2020, ArXiv.

[20]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[21]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[22]  Pradeep Dubey,et al.  Ternary Neural Networks with Fine-Grained Quantization , 2017, ArXiv.

[23]  Xin Dong,et al.  Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[25]  Andrew M. Saxe,et al.  High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.

[26]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Michael W. Mahoney,et al.  Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior , 2017, ArXiv.

[28]  Jing Liu,et al.  Effective Training of Convolutional Neural Networks With Low-Bitwidth Weights and Activations , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Shuchang Zhou,et al.  Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks , 2017, Journal of Computer Science and Technology.

[30]  Chen Feng,et al.  A Quantization-Friendly Separable Convolution for MobileNets , 2018, 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2).

[31]  Ning Xu,et al.  Slimmable Neural Networks , 2018, ICLR.

[32]  Jinwon Lee,et al.  QKD: Quantization-aware Knowledge Distillation , 2019, ArXiv.

[33]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[34]  Eunhyeok Park,et al.  Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[36]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[37]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38]  Albert Gural,et al.  Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware , 2019, ArXiv.

[39]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[40]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[41]  Jae-Joon Han,et al.  Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yingwei Li,et al.  Volumetric Medical Image Segmentation: A 3D Deep Coarse-to-Fine Framework and Its Adversarial Examples , 2020, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[43]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Yuandong Tian,et al.  Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search , 2018, ArXiv.

[45]  Lihi Zelnik-Manor,et al.  Knapsack Pruning with Inner Distillation , 2020, ArXiv.

[46]  Heng Huang,et al.  Direct Shape Regression Networks for End-to-End Face Alignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[48]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[49]  Hadi Esmaeilzadeh,et al.  ReLeQ: An Automatic Reinforcement Learning Approach for Deep Quantization of Neural Networks , 2018 .

[50]  Hongbin Zha,et al.  Alternating Multi-bit Quantization for Recurrent Neural Networks , 2018, ICLR.

[51]  Yi Yang,et al.  Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[52]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[53]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[54]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[55]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[57]  Yu Bai,et al.  ProxQuant: Quantized Neural Networks via Proximal Operators , 2018, ICLR.