Distilling Object Detectors with Feature Richness

In recent years, large-scale deep models have achieved great success, but the huge computational complexity and massive storage requirements make it a great challenge to deploy them in resource-limited devices. As a model compression and acceleration method, knowledge distillation effectively improves the performance of small models by transferring the dark knowledge from the teacher detector. However, most of the existing distillation-based detection methods mainly imitating features near bounding boxes, which suffer from two limitations. First, they ignore the beneficial features outside the bounding boxes. Second, these methods imitate some features which are mistakenly regarded as the background by the teacher detector. To address the above issues, we propose a novel Feature-Richness Score (FRS) method to choose important features that improve generalized detectability during distilling. The proposed method effectively retrieves the important features outside the bounding boxes and removes the detrimental features within the bounding boxes. Extensive experiments show that our methods achieve excellent performance on both anchor-based and anchor-free detectors. For example, RetinaNet with ResNet-50 achieves 39.7% in mAP on the COCO2017 dataset, which even surpasses the ResNet-101 based teacher detector 38.9% by 0.8%.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Jun Li,et al.  Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection , 2020, NeurIPS.

[7]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[8]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[9]  Byung Cheol Song,et al.  Graph-based Knowledge Distillation by Multi-head Attention Network , 2019, BMVC.

[10]  Sheng Tang,et al.  High Resolution Feature Recovering for Accelerating Urban Scene Parsing , 2018, IJCAI.

[11]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Linfeng Zhang,et al.  Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors , 2021, ICLR.

[14]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[15]  Bing Li,et al.  Knowledge Distillation via Instance Relationship Graph , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hassan Ghasemzadeh,et al.  Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.

[19]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Joost van de Weijer,et al.  Learning Metrics From Teachers: Compact Networks for Image Embedding , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Nuno Vasconcelos,et al.  Cascade R-CNN: High Quality Object Detection and Instance Segmentation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[23]  Erjin Zhou,et al.  General Instance Distillation for Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Junjie Yan,et al.  Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jiashi Feng,et al.  Distilling Object Detectors With Fine-Grained Feature Imitation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kai Han,et al.  Distilling Object Detectors via Decoupled Features , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[28]  Xiaopeng Zhang,et al.  Distilling Object Detectors with Task Adaptive Regularization , 2020, ArXiv.

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30]  Jin Young Choi,et al.  Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[31]  Rui Zhang,et al.  Fixed-Point Back-Propagation Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[33]  Jianping Gou,et al.  Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.

[34]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[35]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).