Coarse-Grained Pruning of Neural Network Models Based on Blocky Sparse Structure

Deep neural networks may achieve excellent performance in many research fields. However, many deep neural network models are over-parameterized. The computation of weight matrices often consumes a lot of time, which requires plenty of computing resources. In order to solve these problems, a novel block-based division method and a special coarse-grained block pruning strategy are proposed in this paper to simplify and compress the fully connected structure, and the pruned weight matrices with a blocky structure are then stored in the format of Block Sparse Row (BSR) to accelerate the calculation of the weight matrices. First, the weight matrices are divided into square sub-blocks based on spatial aggregation. Second, a coarse-grained block pruning procedure is utilized to scale down the model parameters. Finally, the BSR storage format, which is much more friendly to block sparse matrix storage and computation, is employed to store these pruned dense weight blocks to speed up the calculation. In the following experiments on MNIST and Fashion-MNIST datasets, the trend of accuracies with different pruning granularities and different sparsity is explored in order to analyze our method. The experimental results show that our coarse-grained block pruning method can compress the network and can reduce the computational cost without greatly degrading the classification accuracy. The experiment on the CIFAR-10 dataset shows that our block pruning strategy can combine well with the convolutional networks.

[1]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[2]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[3]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[4]  Song Han,et al.  Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , 2017, ArXiv.

[5]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[6]  Maoguo Gong,et al.  A Multi-objective Particle Swarm Optimization for Neural Networks Pruning , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[7]  Wei Niu,et al.  PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices , 2020, AAAI.

[8]  Roberto Cipolla,et al.  Training CNNs with Low-Rank Filters for Efficient Image Classification , 2015, ICLR.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[13]  Kenli Li,et al.  Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs , 2015, IEEE Transactions on Computers.

[14]  Rongrong Ji,et al.  HRank: Filter Pruning Using High-Rank Feature Map , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yanming Chen,et al.  CCPrune: Collaborative channel pruning for learning compact convolutional networks , 2021, Neurocomputing.

[16]  Jason Yosinski,et al.  Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[17]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Jiayu Li,et al.  ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers , 2018, ASPLOS.

[20]  Xuelong Li,et al.  Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning , 2019, ArXiv.

[21]  Philip H. S. Torr,et al.  A Signal Propagation Perspective for Pruning Neural Networks at Initialization , 2019, ICLR.

[22]  Xiaogang Wang,et al.  Face Model Compression by Distilling Knowledge from Neurons , 2016, AAAI.

[23]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[24]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[26]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[27]  Kenli Li,et al.  A hybrid computing method of SpMV on CPU-GPU heterogeneous computing systems , 2017, J. Parallel Distributed Comput..

[28]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[29]  Jacek M. Zurada,et al.  Redundant feature pruning for accelerated inference in deep neural networks , 2019, Neural Networks.

[30]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[31]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).