A deep neural network compression algorithm based on knowledge transfer for edge devices

Abstract The computation and storage capacity of the edge device are limited, which seriously restrict the application of deep neural network in the device. Toward to the intelligent application of the edge device, we introduce the deep neural network compression algorithm based on knowledge transfer, a three-stage pipeline: lightweight, multi-level knowledge transfer and pruning that reduce the network depth, parameter and operation complexity of the deep learning neural networks. We lighten the neural networks by using a global average pooling layer instead of a fully connected layer and replacing a standard convolution with separable convolutions. Next, the multi-level knowledge transfer minimizes the difference between the output of the ”student network” and the ”teacher network” in the middle and logits layer, increasing the supervised information when training the ”student network”. Lastly, we prune the network by cutting off the unimportant convolution kernels with a global iterative pruning strategy. The experiment results show that the proposed method improve the efficiency up to 30% than the knowledge distillation method in reducing the loss of classification performance. Benchmarked on GPU (Graphics Processing Unit) server, Raspberry Pi 3 and Cambricon-1A, the parameters of the compressed network after using our knowledge transfer and pruning method have achieved more than 49.5 times compression and the time efficiency of a single feedforward operation has been improved more than 3.2 times.

[1]  Xuyun Zhang,et al.  Data-Driven Web APIs Recommendation for Building Web Applications , 2022, IEEE Transactions on Big Data.

[2]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[3]  Weisong Shi,et al.  The Promise of Edge Computing , 2016, Computer.

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Zibin Zheng,et al.  Covering-Based Web Service Quality Prediction via Neighborhood-Aware Matrix Factorization , 2019, IEEE Transactions on Services Computing.

[9]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[10]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[11]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[12]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[13]  Haibin Zhu,et al.  Location-Aware Deep Collaborative Filtering for Service Recommendation , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[14]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[15]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[16]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[17]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[20]  Vineeth N. Balasubramanian,et al.  Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.

[21]  Pascal Frossard,et al.  Adaptive data augmentation for image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shaoli Liu,et al.  Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[24]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[25]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[26]  Huaming Wu,et al.  Edge Server Quantification and Placement for Offloading Social Media Services in Industrial Cognitive IoV , 2021, IEEE Transactions on Industrial Informatics.

[27]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[30]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[31]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Wanchun Dou,et al.  Privacy-Aware Cross-Platform Service Recommendation Based on Enhanced Locality-Sensitive Hashing , 2021, IEEE Transactions on Network Science and Engineering.

[34]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[35]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.