Provable Filter Pruning for Efficient Neural Networks

We present a provable, sampling-based approach for generating compact Convolutional Neural Networks (CNNs) by identifying and removing redundant filters from an over-parameterized network. Our algorithm uses a small batch of input data points to assign a saliency score to each filter and constructs an importance sampling distribution where filters that highly affect the output are sampled with correspondingly high probability. In contrast to existing filter pruning approaches, our method is simultaneously data-informed, exhibits provable guarantees on the size and performance of the pruned network, and is widely applicable to varying network architectures and data sets. Our analytical bounds bridge the notions of compressibility and importance of network structures, which gives rise to a fully-automated procedure for identifying and preserving filters in layers that are essential to the network's performance. Our experimental evaluations on popular architectures and data sets show that our algorithm consistently generates sparser and more efficient models than those constructed by existing filter pruning approaches.

[1]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  J. Wellner,et al.  High Dimensional Probability III , 2003 .

[4]  V. Koltchinskii,et al.  High Dimensional Probability , 2006, math/0612726.

[5]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[6]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[7]  Petros Drineas,et al.  A note on element-wise matrix sparsification via a matrix-valued Bernstein inequality , 2010, Inf. Process. Lett..

[8]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[9]  Dimitris Achlioptas,et al.  Matrix Entry-wise Sampling : Simple is Best [ Extended Abstract ] , 2013 .

[10]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[11]  Abhisek Kundu,et al.  A Note on Randomized Element-wise Matrix Sparsification , 2014, ArXiv.

[12]  Dimitris Papailiopoulos,et al.  Provable deterministic leverage score sampling , 2014, KDD.

[13]  R. Handel Probability in High Dimension , 2014 .

[14]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[15]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[19]  Sanjiv Kumar,et al.  Binary embeddings with structured hashed projections , 2015, ICML.

[20]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[23]  Vladimir Braverman,et al.  New Frameworks for Offline and Streaming Coreset Constructions , 2016, ArXiv.

[24]  Roberto Cipolla,et al.  Training CNNs with Low-Rank Filters for Efficient Image Classification , 2015, ICLR.

[25]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[26]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Mathieu Salzmann,et al.  Compression-aware Training of Deep Networks , 2017, NIPS.

[28]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[30]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[31]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yanzhi Wang,et al.  Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank , 2017, ICML.

[33]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[34]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[35]  Yi Yang,et al.  More is Less: A More Complicated Network with Less Inference Complexity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Suya You,et al.  Learning to Prune Filters in Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38]  Liam Paull,et al.  Learning Steering Bounds for Parallel Autonomous Systems , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[40]  James Zijun Wang,et al.  Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[41]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[42]  Shannon R. McCurdy Ridge Regression and Provable Deterministic Ridge Leverage Score Sampling , 2018, NeurIPS.

[43]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[45]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[46]  Erich Elsen,et al.  The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[47]  Dan Feldman,et al.  SiPPing Neural Networks: Sensitivity-informed Provable Pruning of Neural Networks , 2019, ArXiv.

[48]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Luc Van Gool,et al.  Learning Filter Basis for Convolutional Neural Network Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[52]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[53]  Pierre Vandergheynst,et al.  Revisiting hard thresholding for DNN pruning , 2019, ArXiv.

[54]  Dan Feldman,et al.  Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds , 2018, ICLR.

[55]  Jianxin Wu,et al.  AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference , 2018, Pattern Recognit..

[56]  Martin Jaggi,et al.  Dynamic Model Pruning with Feedback , 2020, ICLR.

[57]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .