ESPN: Extremely Sparse Pruned Networks

Deep neural networks are often highly overparameterized, prohibiting their use in compute-limited systems. However, a line of recent works has shown that the size of deep networks can be considerably reduced by identifying a subset of neuron indicators (or mask) that correspond to significant weights prior to training. We demonstrate that an simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods (such as SNIP) with Lottery-Ticket type approaches. We validate our approach on several datasets and outperform several existing pruning approaches in both test accuracy and compression ratio.

[1]  Pushmeet Kohli,et al.  Memory Bounded Deep Convolutional Networks , 2014, ArXiv.

[2]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[3]  Fabio Galasso,et al.  Adversarial Network Compression , 2018, ECCV Workshops.

[4]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[5]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[8]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[9]  Alexander M. Rush,et al.  Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.

[10]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[11]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[12]  Gianluca Francini,et al.  Learning Sparse Neural Networks via Sensitivity-Driven Regularization , 2018, NeurIPS.

[13]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[16]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[17]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[18]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[19]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[20]  Lucas Theis,et al.  Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.

[21]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[22]  Sanguthevar Rajasekaran,et al.  AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters , 2019, NeurIPS.

[23]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[24]  M. Maire,et al.  Winning the Lottery with Continuous Sparsification , 2019, NeurIPS.

[25]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[26]  Miguel Á. Carreira-Perpiñán,et al.  "Learning-Compression" Algorithms for Neural Net Pruning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[28]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[29]  Saurabh Singh,et al.  Model Compression by Entropy Penalized Reparameterization , 2019, ArXiv.

[30]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[31]  Rudy Setiono,et al.  A Penalty-Function Approach for Pruning Feedforward Neural Networks , 1997, Neural Computation.

[32]  Matthijs Douze,et al.  Fixing the train-test resolution discrepancy: FixEfficientNet , 2020, ArXiv.

[33]  Vineeth N. Balasubramanian,et al.  Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.

[34]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[35]  David Kappel,et al.  Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[36]  Maarten Stol,et al.  Pruning via Iterative Ranking of Sensitivity Statistics , 2020, ArXiv.

[37]  Jason Yosinski,et al.  Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[38]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[39]  Roger B. Grosse,et al.  Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[40]  Xin Wang,et al.  Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.

[41]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[42]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[43]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[44]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[45]  Diederik P. Kingma,et al.  GPU Kernels for Block-Sparse Weights , 2017 .

[46]  Gintare Karolina Dziugaite,et al.  Stabilizing the Lottery Ticket Hypothesis , 2019 .

[47]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[49]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[50]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Peter Stone,et al.  Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[52]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Michael Carbin,et al.  Comparing Rewinding and Fine-tuning in Neural Network Pruning , 2019, ICLR.