Network Compression via Recursive Bayesian Pruning

Recently, compression and acceleration of deep neural networks are in critic need. Bayesian generalization of structured pruning represents an important research direction to solve the above problem. However, the existing Bayesian methods ignore the dependency among neurons and filters for computational simplicity. In this study, we explore, under Bayesian framework, a structured pruning method with layer-wise sequential dependency assumed, a more general learning setting. Based on the property of Dirac distribution, we further derive a new dropout noise, which makes it possible to approximate the posterior of dropout noise knowing that of the previous layer. With the Dirac-like dropout noise, we further propose a recursive strategy, named Recursive Bayesian Pruning (RBP), to train and prune networks in a layer-by-layer fashion. The unimportant neurons and filters are directly targeted and removed, taking the influence from the previous layer. Experiments on typical neural networks LeNet-300-100, LeNet5 and VGG-16 have demonstrated the proposed method are competitive with or even outperform the state-of-the-art methods in several compression and acceleration metrics.

[1]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[2]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[3]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[4]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[5]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[6]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[7]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[11]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[12]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[13]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[15]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[16]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[18]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[19]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Rongrong Ji,et al.  Accelerating Convolutional Networks via Global & Dynamic Filter Pruning , 2018, IJCAI.

[22]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Sheng Tang,et al.  Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks , 2018, AAAI.

[25]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[26]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[27]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[28]  Dmitry P. Vetrov,et al.  Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.