Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These thresholds can have fine-grained layer-wise adjustments dynamically via backpropagation. We demonstrate that our dynamic sparse training algorithm can easily train very sparse neural network models with little performance loss using the same training epochs as dense models. Dynamic Sparse Training achieves prior art performance compared with other sparse training algorithms on various network architectures. Additionally, we have several surprising observations that provide strong evidence to the effectiveness and efficiency of our algorithm. These observations reveal the underlying problems of traditional three-stage pruning algorithms and present the potential guidance provided by our algorithm to the design of more compact network architectures.

[1]  Xin Wang,et al.  Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.

[2]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[3]  David Kappel,et al.  Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[6]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[7]  Luca Zappella,et al.  Principal Filter Analysis for Guided Network Compression , 2018, ArXiv.

[8]  Peter Stone,et al.  Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[12]  Pavlo Molchanov,et al.  Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[17]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Luke Zettlemoyer,et al.  Sparse Networks from Scratch: Faster Training without Losing Performance , 2019, ArXiv.

[20]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[21]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[22]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[23]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[24]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Naiyan Wang,et al.  Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[27]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[29]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[30]  Rongrong Ji,et al.  Accelerating Convolutional Networks via Global & Dynamic Filter Pruning , 2018, IJCAI.

[31]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Ray C. C. Cheung,et al.  Accurate and Compact Convolutional Neural Networks with Trained Binarization , 2019, BMVC.

[34]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[35]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[36]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[37]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[38]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[39]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.