Sensitivity-based acceleration and compression algorithm for convolution neural network

Convolution neural networks(CNNs) have an impressive performance in image processing and other machine learning tasks, meanwhile, colossal computation and memory requirements existing in most classical models restrict the deployment in portable and power-limited devices. The efficient approaches to circumventing the above hindrances stem from shrinking the network scale, which can be divided into two categories, i.e., the network pruning and low-rank approximation of kernel matrices. As compared to the pruning scheme, the low-rank approximation method has lower compression ratio, but is much friendlier to parallelism. In this paper, by analyzing the sensitivity of the rank of each layer to the network accuracy, we proposed a sensitivity-based layer-wise low-rank approximation algorithm. As compared with the traditional rank reduce methods, the acceleration ratio of our proposal is improved by 20%. When deploying our method on VGGNet-16 model, 2.7x compression/acceleration ratio on convolution layers and 10.9x compression/acceleration ratio on FC layers are achieved with 0.05% top-1 accuracy loss and 0.01% top-5 accuracy loss.

[1]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[4]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[5]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[6]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[7]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[8]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[9]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[13]  Jian Sun,et al.  Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.