A Unified-Model via Block Coordinate Descent for Learning the Importance of Filter

Deep Convolutional Neural Networks (CNNs) are increasingly used in multimedia retrieval, and accelerating Deep CNNs has recently received an ever-increasing research focus. Among various approaches proposed in the literature, filter pruning has been regarded as a promising solution, which is due to its advantage in significant speedup and memory reduction of both network model and intermediate feature maps. Many works have been proposed to find unimportant filters, and then prune it for accelerating Deep CNNs. However, they mainly focus on using heuristic methods to evaluate the importance of filters, such as the statistical information of filters (e.g., prune filter with small $\ell_2$-norm), which may be not perfect. In this paper, we propose a novel filter pruning method, namely A Unified-Model via Block Coordinate Descent for Learning the Importance of Filter (U-BCD). The importance of the filters in our U-BCD is learned by optimizing method. We can simultaneously learn the filter parameters and the importance of filters by block coordinate descent method. When applied to two image classification benchmarks, the effectiveness of our U-BCD is validated. Notably, on CIFAR-10, our U-BCD reduces more than 57% FLOPs on ResNet-110 with even 0.08% relative accuracy improvement, and also achieve state-of-the-art results on ILSVRC-2012.

[1]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[2]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Barbara Hammer,et al.  Deep-Aligned Convolutional Neural Network for Skeleton-Based Action Recognition and Segmentation , 2019, Data Science and Engineering.

[9]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Michael Riegler,et al.  Reproducibility Companion Paper: Selective Deep Convolutional Features for Image Retrieval , 2020, ACM Multimedia.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[13]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[18]  Xiu-Shen Wei,et al.  PyRetri: A PyTorch-based Library for Unsupervised Image Retrieval by Deep Convolutional Neural Networks , 2020, ACM Multimedia.

[19]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[20]  Miguel Á. Carreira-Perpiñán,et al.  "Learning-Compression" Algorithms for Neural Net Pruning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[22]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[23]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Barbara Hammer,et al.  Deep-Aligned Convolutional Neural Network for Skeleton-Based Action Recognition and Segmentation , 2020, Data Science and Engineering.

[28]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[29]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[30]  Chao Qian,et al.  Optimization based Layer-wise Magnitude-based Pruning for DNN Compression , 2018, IJCAI.