Towards Convolutional Neural Networks Compression via Global&Progressive Product Quantization

In recent years, we have witnessed the great success of convolutional neural networks in a wide range of visual applications. However, these networks are typically deficient due to the high cost in storage and computation, which prohibits their further extensions to resource-limited applications. In this paper, we introduce Global&Progressive Product Quantization(G&P PQ), an end-to-end product quantization based network compression method, to merge the separate quantization and finetuning process into a consistent training framework. Compared to existing two-stage methods, we avoid the timeconsuming process of choosing layer-wise finetuning hyperparameters and also make the network capable of learning complex dependencies among layers by quantizing globally and progressively. To validate the effectiveness, we benchmark G&P PQ by applying it to ResNet-like architectures for image classification and demonstrate state-of-the-art tradeoff in terms of model size vs. accuracy under extensive compression configurations compared to previous methods.

[1]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[3]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[4]  Kan Chen,et al.  Billion-scale semi-supervised learning for image classification , 2019, ArXiv.

[5]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[6]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[7]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[8]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[9]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[10]  Rémi Gribonval,et al.  And the Bit Goes Down: Revisiting the Quantization of Neural Networks , 2019, ICLR.

[11]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[18]  Ethan Fetaya,et al.  Learning Discrete Weights Using the Local Reparameterization Trick , 2017, ICLR.

[19]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[20]  Yunhui Guo,et al.  A Survey on Methods and Theories of Quantized Neural Networks , 2018, ArXiv.

[21]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.