论文信息 - Feature Fusion for Online Mutual Knowledge Distillation

Feature Fusion for Online Mutual Knowledge Distillation

We propose a learning framework named Feature Fusion Learning (FFL) that efficiently trains a powerful classifier through a fusion module which combines the feature maps generated from parallel neural networks. Specifically, we train a number of parallel neural networks as sub-networks, then we combine the feature maps from each sub-network using a fusion module to create a more meaningful feature map. The fused feature map is passed into the fused classifier for overall classification. Unlike existing feature fusion methods, in our framework, an ensemble of sub-network classifiers transfers its knowledge to the fused classifier and then the fused classifier delivers its knowledge back to each sub-network, mutually teaching one another in an online-knowledge distillation manner. This mutually teaching system not only improves the performance of the fused classifier but also obtains performance gain in each sub-network. Moreover, our model is more beneficial because different types of network can be used for each sub-network. We have performed a variety of experiments on multiple datasets such as CIFAR-10, CIFAR-100 and ImageNet and proved that our method is more effective than other alternative methods in terms of performance of both sub-networks and the fused classifier.

[1] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3] Nojun Kwak,et al. Feature-map-level Online Adversarial Knowledge Distillation , 2020, ICML.

[4] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Haibin Ling,et al. Multi-Level Contextual RNNs With Attention Model for Scene Labeling , 2016, IEEE Transactions on Intelligent Transportation Systems.

[6] Xu Liu,et al. DualNet: Learn Complementary Features for Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7] Jinwon Lee,et al. QKD: Quantization-aware Knowledge Distillation , 2019, ArXiv.

[8] Rich Caruana,et al. Model compression , 2006, KDD '06.

[9] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[13] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[14] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.

[15] Nojun Kwak,et al. Position-based Scaled Gradient for Model Quantization and Sparse Training , 2020, ArXiv.

[16] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[17] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[19] Geoffrey E. Hinton,et al. Large scale distributed neural network training through online distillation , 2018, ICLR.

[20] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[22] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23] Subhransu Maji,et al. Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24] Jin Young Choi,et al. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[25] Guocong Song,et al. Collaborative Learning for Deep Neural Networks , 2018, NeurIPS.

[26] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.

[27] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[28] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Xu Lan,et al. Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.

[30] Jangho Kim,et al. Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[31] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[32] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[33] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[34] Tao Xiang,et al. Multi-level Factorisation Net for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.