Task Decoupled Knowledge Distillation For Lightweight Face Detectors

Face detection is a hot topic in computer vision. The face detection methods usually consist of two subtasks, i.e. the classification subtask and the regression subtask, which are trained with different samples. However, current face detection knowledge distillation methods usually couple the two subtasks, and use the same set of samples in the distillation task. In this paper, we propose a task decoupled knowledge distillation method, which decouples the detection distillation task into two subtasks and uses different samples in distilling the features of different subtasks. We firstly propose a feature decoupling method to decouple the classification features and the regression features, without introducing any extra calculations at inference time. Specifically, we generate the corresponding features by adding task-specific convolutions in the teacher network and adding adaption convolutions on the feature maps of the student network. Then we select different samples for different subtasks to imitate. Moreover, we also propose an effective probability distillation method to joint boost the accuracy of the student network. We apply our distillation method on a lightweight face detector, EagleEye. Experimental results show that the proposed method effectively improves the student detector's accuracy by 5.1%, 5.1%, and 2.8% AP in Easy, Medium, Hard subsets respectively.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Jiashi Feng,et al.  Distilling Object Detectors With Fine-Grained Feature Imitation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xu Zhao,et al.  Real-Time Multi-Scale Face Detector on Embedded Devices , 2019, Sensors.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Steven C. H. Hoi,et al.  Feature Agglomeration Networks for Single Stage Face Detection , 2017, Neurocomputing.

[9]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Greg Mori,et al.  Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Larry S. Davis,et al.  SSH: Single Stage Headless Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[13]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[14]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[15]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Shifeng Zhang,et al.  Selective Refinement Network for High Performance Face Detection , 2018, AAAI.

[17]  Haifeng Shen,et al.  Learning Better Features for Face Detection with Feature Fusion and Segmentation Supervision , 2018, ArXiv.

[18]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[19]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[20]  Stan Z. Li,et al.  Learning Lightweight Face Detector with Knowledge Distillation , 2019, 2019 International Conference on Biometrics (ICB).

[21]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hanqing Lu,et al.  Mask Guided Knowledge Distillation for Single Shot Detector , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[24]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yang Liu,et al.  HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces , 2019, ArXiv.

[26]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[27]  Xu Tang,et al.  PyramidBox: A Context-assisted Single Shot Face Detector , 2018, ECCV.

[28]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[30]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Andrey Malinin,et al.  Ensemble Distribution Distillation , 2019, ICLR.

[32]  Stefanos Zafeiriou,et al.  RetinaFace: Single-stage Dense Face Localisation in the Wild , 2019, ArXiv.

[33]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[34]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[36]  Shifeng Zhang,et al.  Faceboxes: A CPU real-time and accurate unconstrained face detector , 2019, Neurocomputing.

[37]  Long Chen,et al.  Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[38]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[39]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.