Mixed-precision quantized neural networks with progressively decreasing bitwidth

Abstract Efficient model inference is an important and practical issue in the deployment of deep neural networks on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and arithmetic that could be conducted on dedicated embedded systems. In the previous works, the parameter bitwidth is set homogeneously and there is a trade-off between superior performance and aggressive compression. Actually, the stacked network layers, which are generally regarded as hierarchical feature extractors, contribute diversely to the overall performance. For a well-trained neural network, the feature distributions of different categories are organized gradually as the network propagates forward. Hence the capability requirement on the subsequent feature extractors is reduced. It indicates that the neurons in posterior layers could be assigned with lower bitwidth for quantized neural networks. Based on this observation, a simple yet effective mixed-precision quantized neural network with progressively decreasing bitwidth is proposed to improve the trade-off between accuracy and compression. Extensive experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results while reducing the memory space for quantized parameters by more than 25% in comparison with the homogeneous counterparts. In addition, the results also demonstrate that the higher-precision bottom layers could boost the 1-bit network performance appreciably due to a better preservation of the original image information while the lower-precision posterior layers contribute to the regularization of k − bit networks.

[1]  Steven K. Esser,et al.  Learned Step Size Quantization , 2019, ICLR.

[2]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[3]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[4]  Shwetak N. Patel,et al.  Heterogeneous Bitwidth Binarization in Convolutional Neural Networks , 2018, NeurIPS.

[5]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[7]  Jie Wei,et al.  M3Net: A multi-model, multi-size, and multi-view deep neural network for brain magnetic resonance image segmentation , 2019, Pattern Recognit..

[8]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[10]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[12]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Christoph Meinel,et al.  Training Competitive Binary Neural Networks from Scratch , 2018, ArXiv.

[17]  Haojin Yang,et al.  Training Accurate Binary Neural Networks from Scratch , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[18]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[19]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[20]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[21]  Gaofeng Meng,et al.  DATA: Differentiable ArchiTecture Approximation , 2019, NeurIPS.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[24]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Adaptive Quantization for Deep Neural Network , 2017, AAAI.

[25]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[26]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[27]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[28]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[31]  James T. Kwok,et al.  Loss-aware Binarization of Deep Networks , 2016, ICLR.

[32]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Gang Hua,et al.  How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[34]  Nicu Sebe,et al.  Binary Neural Networks: A Survey , 2020, Pattern Recognit..

[35]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[36]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[38]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Nicholas D. Lane,et al.  An Empirical study of Binary Neural Networks' Optimisation , 2018, ICLR.

[42]  Kenneth E. Barner,et al.  Inductive conformal predictor for convolutional neural networks: Applications to active learning for image classification , 2019, Pattern Recognit..