论文信息 - Embedded Knowledge Distillation in Depth-level Dynamic Neural Network

Embedded Knowledge Distillation in Depth-level Dynamic Neural Network

In real applications, different computation-resource devices need different-depth networks (e.g., ResNet-18/34/50) with high-accuracy. Usually, existing strategies either design multiple networks (nets) and train them independently, or utilize compression techniques (e.g., low-rank decomposition, pruning, and teacher-to-student) to evolve a trained large model into a small net. These methods are subject to the low-accuracy of small nets, or complicated training processes induced by the dependence of accompanying assistive large models. In this article, we propose an elegant Depth-level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures. Instead of training individual nets with differentdepth configurations, we only train a DDNN to dynamically switch different-depth sub-nets at runtime using one set of shared weight parameters. To improve the generalization of sub-nets, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement semantic knowledge transfer from the teacher (full) net to multiple sub-nets. Specifically, the Kullback-Leibler divergence is introduced to constrain the posterior class probability consistency between full-net and sub-net, and self-attention on the same resolution feature of different depth is addressed to drive more abundant feature representations of sub-nets. Thus, we can obtain multiple highaccuracy sub-nets simultaneously in a DDNN via the online knowledge distillation in each training iteration without extra computation cost. Extensive experiments on CIFAR-10, CIFAR100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieves better performance than the depthlevel pruning or individually training while preserving the original performance of full-net.

Guangliang Cheng | Shuchang Lyu | Ting-Bing Xu

[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jia Deng,et al. Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution , 2017, AAAI.

[4] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Matthew Richardson,et al. Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[6] Zhaoxiang Zhang,et al. DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer , 2017, AAAI.

[7] Hanan Samet,et al. Training Quantized Nets: A Deeper Understanding , 2017, NIPS.

[8] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[9] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[10] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.

[13] Ning Xu,et al. Slimmable Neural Networks , 2018, ICLR.

[14] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[16] Marios Savvides,et al. Online Ensemble Model Compression Using Knowledge Distillation , 2020, ECCV.

[17] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[18] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[19] Zhiqiang Shen,et al. Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] Naiyan Wang,et al. Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[21] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[22] Wonyong Sung,et al. Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[23] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[25] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[26] Kaisheng Ma,et al. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Asit K. Mishra,et al. Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[28] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[30] Tony X. Han,et al. Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[31] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[33] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[36] Larry S. Davis,et al. BlockDrop: Dynamic Inference Paths in Residual Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Jianbin Tang,et al. Ensemble Knowledge Distillation for Learning Improved and Efficient Networks , 2020, ECAI.

[39] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[40] Ke Chen,et al. Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Thomas S. Huang,et al. Universally Slimmable Networks and Improved Training Techniques , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Zhidong Deng,et al. DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Wei-Shi Zheng,et al. Improving Fast Segmentation With Teacher-Student Learning , 2018, BMVC.

[45] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[46] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[47] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[50] Kilian Q. Weinberger,et al. Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[51] Junjie Yan,et al. Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[53] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[54] Xin Wang,et al. SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.