论文信息 - Sparseness Ratio Allocation and Neuron Re-pruning for Neural Networks Compression

Sparseness Ratio Allocation and Neuron Re-pruning for Neural Networks Compression

Convolutional neural networks (CNNs) are rapidly gaining popularity in artificial intelligence applications and employed in mobile devices. However, this is challenging because of the high computational complexity of CNNs and the limited hardware resource in mobile devices. To address this issue, compressing the CNN model is an efficient solution. This work presents a new framework of model compression, with the sparseness ratio allocation (SRA) and the neuron re-pruning (NRP). To achieve a higher overall spareness ratio, SRA is exploited to determine pruned weight percentage for each layer. NRP is performed after the usual weight pruning to further reduce the relative redundant neurons in the meanwhile of guaranteeing the accuracy. From experimental results, with a slight accuracy drop of 0.1%, the proposed framework achieves 149.3× compression on lenet-5. The storage size can be reduced by about 50% relative to previous works. 8–45.2% computational energy and 11.5–48.2% memory traffic energy are saved.

[1] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[2] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[3] Bin Yu,et al. Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning , 2017, ArXiv.

[4] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[6] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[7] Dajiang Zhou,et al. Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[8] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).