Efficient Bitwidth Search for Practical Mixed Precision Neural Network

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the overall performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Meanwhile, it is yet unclear how to perform convolution for weights and activations of different precision efficiently on generic hardware platforms. To resolve these two issues, in this paper, we first propose an Efficient Bitwidth Search (EBS) algorithm, which reuses the meta weights for different quantization bitwidth and thus the strength for each candidate precision can be optimized directly w.r.t the objective without superfluous copies, reducing both the memory and computational cost significantly. Second, we propose a binary decomposition algorithm that converts weights and activations of different precision into binary matrices to make the mixed precision convolution efficient and practical. Experiment results on CIFAR10 and ImageNet datasets demonstrate our mixed precision QNN outperforms the handcrafted uniform bitwidth counterparts and other mixed precision techniques.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Hadi Esmaeilzadeh,et al.  ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks , 2018, ArXiv.

[3]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[4]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5]  Jae-Joon Han,et al.  Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[7]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[8]  G. Hua,et al.  LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.

[9]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[12]  Eriko Nurvitadhi,et al.  WRPN: Wide Reduced-Precision Networks , 2017, ICLR.

[13]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[16]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[17]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[18]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Hadi Esmaeilzadeh,et al.  Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[20]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[21]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[22]  Xianglong Liu,et al.  Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Magnus Själander,et al.  BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[26]  Kurt Keutzer,et al.  HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Yuandong Tian,et al.  Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search , 2018, ArXiv.

[29]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[31]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[32]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[33]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[34]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).