Automatic Compression Ratio Allocation for Pruning Convolutional Neural Networks

Convolutional neural networks (CNNs) have demonstrated significant performance improvement in many application scenarios. However, the high computational complexity and model size have limited its application on the mobile and embedded devices. Various approaches have been proposed to compress CNNs. Filter pruning is widely considered as a promising solution, which can significantly speed up the inference and reduce memory consumption. To this end, most approaches tend to prune filters by manually allocating compression ratio, which highly relies on individual expertise and not friendly to non-professional users. In this paper, we propose an Automatic Compression Ratio Allocation (ACRA) scheme based on binary search algorithm to prune convolutional neural networks. Specifically, ACRA provides two strategies for allocating compression ratio automatically. First, uniform pruning strategy allocates the same compression ratio to each layer, which is obtained by binary search based on target FLOPs reduction of the whole networks. Second, sensitivity-based pruning strategy allocates appropriate compression ratio to each layer based on the sensitivity to accuracy. Experimental results from VGG11 and VGG-16, demonstrate that our scheme can reduce FLOPs significantly while maintaining a high accuracy level. Specifically, for the VGG16 on CIFAR-10 dataset, we reduce 29.18% FLOPs with only 1.24% accuracy decrease.

[1]  Rui Peng,et al.  Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.

[2]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[3]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[4]  Rongrong Ji,et al.  Accelerating Convolutional Networks via Global & Dynamic Filter Pruning , 2018, IJCAI.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[8]  Rongrong Ji,et al.  Holistic CNN Compression via Low-Rank Decomposition with Knowledge Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13]  Greg Mori,et al.  Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[16]  Hassan Foroosh,et al.  Factorized Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[17]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[20]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[21]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[22]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[24]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[25]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[26]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[28]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.