Residual Quantization for Low Bit-Width Neural Networks

Neural network quantization has shown to be an effective way for network compression and acceleration. However, existing binary or ternary quantization methods suffer from two major issues. First, low bit-width input/activation quantization easily results in severe prediction accuracy degradation. Second, network training and quantization are always treated as two non-related tasks, leading to accumulated parameter training error and quantization error. In this work, we introduce a novel scheme, named Residual Quantization, to train a neural network with both weights and inputs constrained to low bit-width, e.g., binary or ternary values. On one hand, by recursively performing residual quantization, the resulting binary/ternary network is guaranteed to approximate the full-precision network with much smaller errors. On the other hand, we mathematically re-formulate the network training scheme in an EM-like manner, which iteratively performs network quantization and parameter optimization. During expectation, the low bit-width network is encouraged to approximate the full-precision network. During maximization, the low bit-width network is further tuned to gain better representation capability. Extensive experiments well demonstrate that the proposed quantization scheme outperforms previous low bit-width methods and achieves much closer performance to the full-precision counterpart.

[1]  Georgios Tzimiropoulos,et al.  High-Capacity Expert Binary Networks , 2020, ICLR.

[2]  Junsong Yuan,et al.  Pruning 3D Filters For Accelerating 3D ConvNets , 2020, IEEE Transactions on Multimedia.

[3]  Qiang Chen,et al.  Towards Accurate Post-training Network Quantization via Bit-Split and Stitching , 2020, ICML.

[4]  Hongkai Xiong,et al.  Iterative Deep Neural Network Quantization With Lipschitz Constraint , 2020, IEEE Transactions on Multimedia.

[5]  Georgios Tzimiropoulos,et al.  Training Binary Neural Networks with Real-to-Binary Convolutions , 2020, ICLR.

[6]  Wen Gao,et al.  Multimedia Intelligence: When Multimedia Meets Artificial Intelligence , 2020, IEEE Transactions on Multimedia.

[7]  Bingbing Ni,et al.  Loopy Residual Hashing: Filling the Quantization Gap for Image Retrieval , 2020, IEEE Transactions on Multimedia.

[8]  Rongrong Ji,et al.  Holistic CNN Compression via Low-Rank Decomposition with Knowledge Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ratna Babu Chinnam,et al.  SPA-GAN: Spatial Attention GAN for Image-to-Image Translation , 2019, IEEE Transactions on Multimedia.

[10]  Wei Li,et al.  Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks , 2019, ICIRA.

[11]  Boyang Li,et al.  A Multi-Task Neural Approach for Emotion Attribution, Classification, and Summarization , 2018, IEEE Transactions on Multimedia.

[12]  Daniel Soudry,et al.  Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.

[13]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[14]  Swagath Venkataramani,et al.  Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN) , 2018, ArXiv.

[15]  Xu Tan,et al.  Progressive Blockwise Knowledge Distillation for Neural Network Acceleration , 2018, IJCAI.

[16]  Yurong Chen,et al.  Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Yuhui Xu,et al.  Filter Level Pruning Based on Similar Feature Extraction for Convolutional Neural Networks , 2018, IEICE Trans. Inf. Syst..

[18]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[19]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[21]  Mathieu Salzmann,et al.  Compression-aware Training of Deep Networks , 2017, NIPS.

[22]  Jie Ding,et al.  Asymmetric Ternary Networks , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[23]  Bingbing Ni,et al.  Performance Guaranteed Network Acceleration via High-Order Residual Quantization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Yurong Chen,et al.  Network Sketching: Exploiting Binary Structure in Deep CNNs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[27]  Hao Zhou,et al.  Less Is More: Towards Compact CNNs , 2016, ECCV.

[28]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[29]  Rohit Prabhavalkar,et al.  On the Efficient Representation and Execution of Deep Acoustic Models , 2016, INTERSPEECH.

[30]  Rui Peng,et al.  Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.

[31]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[32]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[36]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[37]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[39]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[40]  Samira Ebrahimi Kahou,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[41]  Jian Sun,et al.  Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[43]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[45]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[46]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[47]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Guillermo Sapiro,et al.  Learning Efficient Sparse and Low Rank Models , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[51]  Wenbing Tao,et al.  Learning Linear Regression via Single-Convolutional Layer for Visual Object Tracking , 2019, IEEE Transactions on Multimedia.

[52]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[53]  Changsheng Xu,et al.  Cross-Domain Feature Learning in Multimedia , 2015, IEEE Transactions on Multimedia.

[54]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[55]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[56]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.