Low-bit Quantization Needs Good Distribution

Low-bit quantization is challenging to maintain high performance with limited model capacity (e.g., 4-bit for both weights and activations). Naturally, the distribution of both weights and activations in deep neural network are Gaussian-like. Nevertheless, due to the limited bitwidth of low-bit model, uniform-like distributed weights and activations have been proved to be more friendly to quantization while preserving accuracy. Motivated by this, we propose Scale-Clip, a Distribution Reshaping technique that can reshape weights or activations into a uniform-like distribution in a dynamic manner. Furthermore, to increase the model capability for a low-bit model, a novel Group- based Quantization algorithm is proposed to split the filters into several groups. Different groups can learn different quantization parameters, which can be elegantly merged into batch normalization layer without extra computational cost in the inference stage. Finally, we integrate Scale-Clip technique with Group-based Quantization algorithm and propose the Group-based Distribution Reshaping Quantization (GDRQ) framework to further improve the quantization performance. Experiments on various networks (e.g. VGGNet and ResNet) and vision tasks (e.g. classification, detection, and segmentation) demonstrate that our framework achieves much better performance than state-of-the- art quantization methods. Specifically, the ResNet-50 model with 2-bit weights and 4-bit activations obtained by our framework achieves less than 1% accuracy drop on ImageNet classification task, which is a new state-of-the-art to our best knowledge.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[3]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[5]  Philip Heng Wai Leong,et al.  SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  James T. Kwok,et al.  Analysis of Quantized Models , 2019, ICLR.

[9]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[10]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[13]  Yurong Chen,et al.  Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Song Han,et al.  HAQ: Hardware-Aware Automated Quantization , 2018, ArXiv.

[15]  Philipp Gysel,et al.  Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[16]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[17]  Jack Xin,et al.  Quantization and Training of Low Bit-Width Convolutional Neural Networks for Object Detection , 2016, Journal of Computational Mathematics.

[18]  Yuandong Tian,et al.  Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search , 2018, ArXiv.

[19]  Pradeep Dubey,et al.  Ternary Neural Networks with Fine-Grained Quantization , 2017, ArXiv.

[20]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[21]  Quoc V. Le,et al.  Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.

[22]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[23]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[24]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[26]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[27]  Jian Cheng,et al.  Fixed-Point Factorized Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[29]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yang Liu,et al.  Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hanqing Lu,et al.  Recent advances in efficient computation of deep convolutional neural networks , 2018, Frontiers of Information Technology & Electronic Engineering.

[33]  Shuchang Zhou,et al.  Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks , 2017, Journal of Computer Science and Technology.

[34]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[35]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.