Prime-Aware Adaptive Distillation

Knowledge distillation(KD) aims to improve the performance of a student network by mimicing the knowledge from a powerful teacher network. Existing methods focus on studying what knowledge should be transferred and treat all samples equally during training. This paper introduces the adaptive sample weighting to KD. We discover that previous effective hard mining methods are not appropriate for distillation. Furthermore, we propose Prime-Aware Adaptive Distillation (PAD) by the incorporation of uncertainty learning. PAD perceives the prime samples in distillation and then emphasizes their effect adaptively. PAD is fundamentally different from and would refine existing methods with the innovative view of unequal training. For this reason, PAD is versatile and has been applied in various tasks including classification, metric learning, and object detection. With ten teacher-student combinations on six datasets, PAD promotes the performance of existing distillation methods and outperforms recent state-of-the-art methods.

[1]  Jiashi Feng,et al.  Distilling Object Detectors With Fine-Grained Feature Imitation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Phillip Isola,et al.  Contrastive Representation Distillation , 2020, ICLR.

[3]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[4]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kai Chen,et al.  Prime Sample Attention in Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[10]  Fan Zhang,et al.  Noise-Tolerant Paradigm for Training Face Recognition CNNs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Klaus Dietmayer,et al.  Uncertainty Estimation in One-Stage Object Detection , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[14]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[15]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[16]  Junjie Yan,et al.  An Embarrassingly Simple Approach for Knowledge Distillation , 2018 .

[17]  Weihong Deng,et al.  Unequal-Training for Deep Face Recognition With Long-Tailed Noisy Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[19]  Junjie Yan,et al.  Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[21]  Tao Xiang,et al.  Robust Person Re-Identification by Modelling Feature Uncertainty , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23]  Yu Liu,et al.  Gradient Harmonized Single-stage Detector , 2018, AAAI.

[24]  Neil D. Lawrence,et al.  Variational Information Distillation for Knowledge Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Anil K. Jain,et al.  Probabilistic Face Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Saman Ghili,et al.  Tiny ImageNet Visual Recognition Challenge , 2014 .

[27]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[28]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[29]  Jang Hyun Cho,et al.  On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[31]  Rui Yu,et al.  Hard-Aware Point-to-Set Deep Metric for Person Re-identification , 2018, ECCV.

[32]  Shuichi Arai,et al.  Deep convolutional encoder-decoder network with model uncertainty for semantic segmentation , 2017, 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA).

[33]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[34]  Jiashi Feng,et al.  Revisit Knowledge Distillation: a Teacher-free Framework , 2019, ArXiv.

[35]  Yu Liu,et al.  Correlation Congruence for Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Hyuk-Jae Lee,et al.  Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Zhaoxiang Zhang,et al.  DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer , 2017, AAAI.

[41]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).