论文信息 - Towards Convolutional Neural Networks Compression via Global&Progressive Product Quantization

Towards Convolutional Neural Networks Compression via Global&Progressive Product Quantization

In recent years, we have witnessed the great success of convolutional neural networks in a wide range of visual applications. However, these networks are typically deficient due to the high cost in storage and computation, which prohibits their further extensions to resource-limited applications. In this paper, we introduce Global&Progressive Product Quantization(G&P PQ), an end-to-end product quantization based network compression method, to merge the separate quantization and finetuning process into a consistent training framework. Compared to existing two-stage methods, we avoid the timeconsuming process of choosing layer-wise finetuning hyperparameters and also make the network capable of learning complex dependencies among layers by quantizing globally and progressively. To validate the effectiveness, we benchmark G&P PQ by applying it to ResNet-like architectures for image classification and demonstrate state-of-the-art tradeoff in terms of model size vs. accuracy under extensive compression configurations compared to previous methods.

[1] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[3] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[4] Kan Chen,et al. Billion-scale semi-supervised learning for image classification , 2019, ArXiv.

[5] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.

[6] Lin Xu,et al. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[7] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.

[8] Wei Pan,et al. Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[9] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[10] Rémi Gribonval,et al. And the Bit Goes Down: Revisiting the Quantization of Neural Networks , 2019, ICLR.

[11] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[16] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Wei Liu,et al. Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[18] Ethan Fetaya,et al. Learning Discrete Weights Using the Local Reparameterization Trick , 2017, ICLR.

[19] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[20] Yunhui Guo,et al. A Survey on Methods and Theories of Quantized Neural Networks , 2018, ArXiv.

[21] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.