Focal and Global Knowledge Distillation for Detectors

Knowledge distillation has been applied to image classification successfully. However, object detection is much more sophisticated and most knowledge distillation methods have failed on it. In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. If we distill them equally, the uneven differences between feature maps will negatively affect the distillation. Thus, we propose Focal and Global Distillation (FGD). Focal distillation separates the foreground and background, forcing the student to focus on the teacher’s critical pixels and channels. Global distillation rebuilds the relation between different pixels and transfers it from teachers to students, compensating for missing global information in focal distillation. As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors. We experiment on various detectors with different backbones and the results show that the student detector achieves excellent mAP improvement. For example, ResNet-50 based RetinaNet, Faster RCNN, RepPoints and Mask RCNN with our distillation method achieve 40.7%, 42.0%, 42.0% and 42.1% mAP on COCO2017, which are 3.3, 3.6, 3.4 and 2.9 higher than the baseline, respectively. Our codes are available at https://github.com/yzd-v/FGD.

[1]  Xiaopeng Zhang,et al.  Distilling Object Detectors with Task Adaptive Regularization , 2020, ArXiv.

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  Jiashi Feng,et al.  Distilling Object Detectors With Fine-Grained Feature Imitation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[7]  Erjin Zhou,et al.  General Instance Distillation for Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Junjie Yan,et al.  Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[13]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[14]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Stephen Lin,et al.  RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[20]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[21]  Greg Mori,et al.  Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[24]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[26]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Xiangyu Zhang,et al.  Instance-Conditional Knowledge Distillation for Object Detection , 2021, NeurIPS.

[28]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Linfeng Zhang,et al.  Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors , 2021, ICLR.

[31]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[32]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  Kai Han,et al.  Distilling Object Detectors via Decoupled Features , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).