论文信息 - Dynamic Model Pruning with Feedback

Dynamic Model Pruning with Feedback

Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate the method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models and further that their performance surpasses all previously proposed pruning schemes (that come without feedback mechanisms).

[1] Michael C. Mozer,et al. Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[2] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[3] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[5] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.

[6] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[7] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[8] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[9] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[10] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11] Mathieu Salzmann,et al. Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[12] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Song Han,et al. DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow , 2016, ArXiv.

[15] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.

[16] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[17] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[18] Erich Elsen,et al. Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[19] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[20] Alex Kendall,et al. Concrete Dropout , 2017, NIPS.

[21] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[22] Mathieu Salzmann,et al. Compression-aware Training of Deep Networks , 2017, NIPS.

[23] Hanan Samet,et al. Training Quantized Nets: A Deeper Understanding , 2017, NIPS.

[24] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[26] R. Venkatesh Babu,et al. Training Sparse Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27] Dmitry P. Vetrov,et al. Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[28] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.

[29] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[30] Yi Yang,et al. Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[31] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[32] James Zijun Wang,et al. Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[33] Peter Stone,et al. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[34] David Kappel,et al. Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[35] Miguel Á. Carreira-Perpiñán,et al. "Learning-Compression" Algorithms for Neural Net Pruning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Gintare Karolina Dziugaite,et al. Stabilizing the Lottery Ticket Hypothesis , 2019 .

[37] Jack Xin,et al. Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[38] W. Wen,et al. PruneTrain: Gradual Structured Pruning from Scratch for Faster Neural Network Training , 2019, arXiv.org.