Campfire: Compressible, Regularization-Free, Structured Sparse Training for Hardware Accelerators

This paper studies structured sparse training of CNNs with a gradual pruning technique that leads to fixed, sparse weight matrices after a set number of epochs. We simplify the structure of the enforced sparsity so that it reduces overhead caused by regularization. The proposed training methodology Campfire explores pruning at granularities within a convolutional kernel and filter. We study various tradeoffs with respect to pruning duration, level of sparsity, and learning rate configuration. We show that our method creates a sparse version of ResNet-50 and ResNet-50 v1.5 on full ImageNet while remaining within a negligible <1% margin of accuracy loss. To ensure that this type of sparse training does not harm the robustness of the network, we also demonstrate how the network behaves in the presence of adversarial attacks. Our results show that with 70% target sparsity, over 75% top-1 accuracy is achievable.

[1]  Peter Stone,et al.  Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[2]  Christopher Ré,et al.  Low-Memory Neural Network Training: A Technical Report , 2019, ArXiv.

[3]  Xuelong Li,et al.  Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning , 2019, ArXiv.

[4]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[6]  David Kappel,et al.  Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[7]  Luca Zappella,et al.  Principal Filter Analysis for Guided Network Compression , 2018, ArXiv.

[8]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[9]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[10]  Yu Wang,et al.  Exploring the Granularity of Sparsity in Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[12]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Luca Zappella,et al.  NETWORK COMPRESSION USING CORRELATION ANALYSIS OF LAYER RESPONSES , 2018 .

[14]  Stephen Richardson,et al.  Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era , 2016, IEEE Design & Test.

[15]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[16]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[18]  Yuan Xie,et al.  Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs , 2019, MICRO.

[19]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[20]  Luyu Wang,et al.  Adversarial Robustness of Pruned Neural Networks , 2018 .

[21]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[22]  Naiyan Wang,et al.  Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[23]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Saman Ghili,et al.  Tiny ImageNet Visual Recognition Challenge , 2014 .

[26]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[28]  Yuan Xie,et al.  $L1$ -Norm Batch Normalization for Efficient Training of Deep Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[29]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[30]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[31]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[32]  Erich Elsen,et al.  The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[33]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Niraj K. Jha,et al.  NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm , 2017, IEEE Transactions on Computers.

[35]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[36]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[38]  Xin Wang,et al.  Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.

[39]  Guy Lemieux,et al.  Full Deep Neural Network Training On A Pruned Weight Budget , 2018, MLSys.