FeTa: A DCA Pruning Algorithm with Generalization Error Guarantees

Recent DNN pruning algorithms have succeeded in reducing the number of parameters in fully connected layers, often with little or no drop in classification accuracy. However, most of the existing pruning schemes either have to be applied during training or require a costly retraining procedure after pruning to regain classification accuracy. We start by proposing a cheap pruning algorithm for fully connected DNN layers based on difference of convex functions (DC) optimisation, that requires little or no retraining. We then provide a theoretical analysis for the growth in the Generalization Error (GE) of a DNN for the case of bounded perturbations to the hidden layers, of which weight pruning is a special case. Our pruning method is orders of magnitude faster than competing approaches, while our theoretical analysis sheds light to previously observed problems in DNN pruning. Experiments on commnon feedforward neural networks validate our results.

[1]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[2]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[3]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[4]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[5]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[6]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[7]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[10]  Afshin Abdi,et al.  Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee , 2016, NIPS.

[11]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[12]  James T. Kwok,et al.  Loss-aware Binarization of Deep Networks , 2016, ICLR.

[13]  Bhiksha Raj,et al.  The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning , 2017, ArXiv.

[14]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[15]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[16]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[17]  Pascal Frossard,et al.  Dictionary Learning for Fast Classification Based on Soft-thresholding , 2014, International Journal of Computer Vision.

[18]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[20]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[21]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[22]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[23]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.