Filter-based deep-compression with global average pooling for convolutional networks

Abstract Deep neural networks are powerful, but using these networks is both memory and time consuming due to their numerous parameters and large amounts of computation. Many studies have been conducted on compressing the models on the parameter-level as well as on the bit-level. Here, we propose an efficient strategy to compress on the layers that are computation or memory consuming. We compress the model by introducing global average pooling, performing iterative pruning on the filters with the proposed order-deciding scheme in order to prune more efficiently, applying truncated SVD to the fully-connected layer, and performing quantization. Experiments on the VGG16 model show that our approach achieves a 60.9 ×  compression ratio in off-line storage with about 0.848% and 0.1378% loss of accuracy in the top-1 and top-5 classification results, respectively, with the validation dataset of ILSVRC2012. Our approach also shows good compression results on AlexNet and faster R-CNN.

[1]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2]  Lav R. Varshney,et al.  Universal Source Coding of Deep Neural Networks , 2017, 2017 Data Compression Conference (DCC).

[3]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[4]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[6]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[7]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[8]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[9]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[10]  Jianxin Wu,et al.  ThiNet: Pruning CNN Filters for a Thinner Net , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jian Sun,et al.  Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Weiyao Lin,et al.  Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages , 2018, BMVC.

[15]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.