论文信息 - Training Sparse Neural Networks using Compressed Sensing

Training Sparse Neural Networks using Compressed Sensing

Pruning the weights of neural networks is an effective and widely-used technique for reducing model size and inference complexity. We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step. Specifically, we utilize an adaptively weighted $\ell^1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks. The adaptive weighting we introduce corresponds to a novel regularizer based on the logarithm of the absolute value of the weights. Numerical experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that our method 1) trains sparser, more accurate networks than existing state-of-the-art methods; 2) can also be used effectively to obtain structured sparsity; 3) can be used to train sparse networks from scratch, i.e. from a random initialization, as opposed to initializing with a well-trained base model; 4) acts as an effective regularizer, improving generalization accuracy.

Jinchao Xu | Jianhong Chen | Jonathan W. Siegel | Jinchao Xu | Jianhong Chen

[1] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.

[2] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Hao Zhou,et al. Less Is More: Towards Compact CNNs , 2016, ECCV.

[4] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] I. G. MacKenzie,et al. Stochastic Processes with Applications , 1992 .

[7] Ji Liu,et al. Global Sparse Momentum SGD for Pruning Very Deep Neural Networks , 2019, NeurIPS.

[8] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[9] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[10] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[11] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[13] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[16] Suvrit Sra,et al. Diversity Networks , 2015, ICLR.

[17] Xin Dong,et al. Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[18] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[19] Rui Peng,et al. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.

[20] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[21] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.

[22] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23] Mathieu Salzmann,et al. Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[24] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[25] E. Candès,et al. Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[26] Mathieu Salzmann,et al. Compression-aware Training of Deep Networks , 2017, NIPS.

[27] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[28] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[29] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[30] Noah Simon,et al. A Sparse-Group Lasso , 2013 .

[31] Suvrit Sra,et al. Diversity Networks: Neural Network Compression Using Determinantal Point Processes , 2015, 1511.05077.

[32] Jinchao Xu,et al. Modified Regularized Dual Averaging Method for Training Sparse Convolutional Neural Networks , 2018, ArXiv.

[33] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[34] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[35] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[36] Luke Zettlemoyer,et al. Sparse Networks from Scratch: Faster Training without Losing Performance , 2019, ArXiv.

[37] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[38] R. Venkatesh Babu,et al. Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[39] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[40] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[41] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[42] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[43] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[44] Michael Carbin,et al. The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[45] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[46] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[47] Stephen P. Boyd,et al. Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[48] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[49] S. Frick,et al. Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[50] Naiyan Wang,et al. Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[51] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[52] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[53] Jinchao Xu,et al. Extended Regularized Dual Averaging , 2019 .

[54] Wei Deng,et al. An Adaptive Empirical Bayesian Method for Sparse Deep Learning , 2019, NeurIPS.

[55] Emmanuel J. Candès,et al. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[56] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[57] Wotao Yin,et al. Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[58] Victor S. Lempitsky,et al. Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[60] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[61] Jack Xin,et al. Blended coarse gradient descent for full quantization of deep neural networks , 2018, Research in the Mathematical Sciences.

[62] Zhiqiang Shen,et al. Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).