An Efficient End-to-End Channel Level Pruning Method for Deep Neural Networks Compression

Deep neural networks (DNNS) have obtained compelling performance among many visual tasks by a significant increase in the computation and memory consumption, which severely impede their applications on resource-constrained systems like smart mobiles or embedded devices. To solve these problems, recent efforts toward compressing DNNS have received increased focus. In this paper, we proposed an effective end-to-end channel pruning approach to compress DNNS. To this end, firstly, we introduce additional auxiliary classifiers to enhance the discriminative power of shallow and intermediate layers. Secondly, we impose Ll-regularization on the scaling factors and shifting factors in batch normalization (BN) layer, and adopt the fast and iterative shrinkage-thresholding algorithm (FISTA) to effectively prune the redundant channels. Finally, by forcing selected factors to zero, we can prune the corresponding unimportant channels safely, thus obtaining a compact model. We empirically reveal the prominent performance of our approach with several state-of-theart DNNS architectures, including VGGNet, and MobileNet, on different datasets. For instance, on cifar10 dataset, the pruned MobileNet achieves 26. 9x reduction in model parameters and 3. 9x reduction in computational operations with only 0.04% increase of classification error.

[1]  Ian D. Reid,et al.  Towards Effective Low-Bitwidth Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[6]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[7]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[8]  Liujuan Cao,et al.  Towards Optimal Structured CNN Pruning via Generative Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[12]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Rongrong Ji,et al.  Holistic CNN Compression via Low-Rank Decomposition with Knowledge Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[17]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[18]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[21]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.