Using Distillation to Improve Network Performance after Pruning and Quantization

As the complexity of processing issues increases, deep neural networks require more computing and storage resources. At the same time, the researchers found that the deep neural network contains a lot of redundancy, causing unnecessary waste, and the network model needs to be further optimized. Based on the above ideas, researchers have turned their attention to building more compact and efficient models in recent years, so that deep neural networks can be better deployed on nodes with limited resources to enhance their intelligence. At present, the deep neural network model compression method have weight pruning, weight quantization, and knowledge distillation and so on, these three methods have their own characteristics, which are independent of each other and can be self-contained, and can be further optimized by effective combination. This paper will construct a deep neural network model compression framework based on weight pruning, weight quantization and knowledge distillation. Firstly, the model will be double coarse-grained compression with pruning and quantization, then the original network will be used as the teacher network to guide the compressed student network. Training is performed to improve the accuracy of the student network, thereby further accelerating and compressing the model to make the loss of accuracy smaller. The experimental results show that the combination of three algorithms can compress 80% FLOPs and reduce the accuracy by only 1%.

[1]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[2]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[3]  Ini Oguntola,et al.  SlimNets: An Exploration of Deep Model Compression and Acceleration , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[4]  Yi Yang,et al.  Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, ArXiv.

[5]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[7]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[8]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[9]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[10]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[11]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[12]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[13]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[15]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[16]  Vinay P. Namboodiri,et al.  Leveraging Filter Correlations for Deep Model Compression , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[18]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[19]  Philipp Gysel,et al.  Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[20]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).