One-stage object detection knowledge distillation via adversarial learning

Impressive methods for object detection tasks have been proposed based on convolutional neural networks (CNNs), however, they usually use very computation expensive deep networks to obtain such significant performance. Knowledge distillation has attracted much attention in the task of image classification lately since it can use compact models that reduce computations while preserving performance. Moreover, the best performing deep neural networks often assemble the outputs of multiple networks in an average way. However, the memory required to store these networks, and the time required to execute them in inference, which prohibits these methods used in real-time applications. In this paper, we present a knowledge distillation method for one-stage object detection, which can assemble a variety of large, complex trained networks into a lightweight network. In order to transfer diverse knowledge from various trained one-stage object detection networks, an adversarial-based learning strategy is employed as supervision to guide and optimize the lightweight student network to recover the knowledge of teacher networks, and to enable the discriminator module to distinguish the feature of teacher and student simultaneously. The proposed method exhibits two predominant advantages: (1) The lightweight student model can learn the knowledge of the teacher, which contains richer discriminative information than the model trained from scratch. (2) Faster inference speed than traditional ensemble methods from multiple networks is realized. A large number of experiments are carried out on PASCAL VOC and MS COCO datasets to verify the effectiveness of the proposed method for one-stage object detection, which obtains 3.43%, 2.48%, and 5.78% mAP promotions for vgg11-ssd, mobilenetv1-ssd-lite and mobilenetv2-ssd-lite student network on the PASCAL VOC 2007 dataset, respectively. Furthermore, with multi-teacher ensemble method, vgg11-ssd gains 7.10% improvement, which is remarkable.

[1]  Muhammad Umar Khan,et al.  Expert Hypertension Detection System Featuring Pulse Plethysmograph Signals and Hybrid Feature Selection and Reduction Scheme , 2021, Sensors.

[2]  Bhuvana Ramabhadran,et al.  Efficient Knowledge Distillation from an Ensemble of Teachers , 2017, INTERSPEECH.

[3]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[4]  Wenzhong Guo,et al.  Unsupervised discriminative feature representation via adversarial auto-encoder , 2019, Applied Intelligence.

[5]  Yongqiang Zhang,et al.  SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network , 2018, ECCV.

[6]  Bingbing Ni,et al.  Scale-Transferrable Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Bernard Ghanem,et al.  Detecting small faces in the wild based on generative adversarial network and contextual information , 2019, Pattern Recognit..

[8]  ZhangYunqi,et al.  The Architectural Implications of Autonomous Driving , 2018 .

[9]  Jian Sun,et al.  Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[11]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[13]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Baoqun Yin,et al.  Pruning filters with L1-norm and capped L1-norm for CNN compression , 2020, Applied Intelligence.

[15]  Jin Young Choi,et al.  Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[16]  Yibo Ai,et al.  Mask-guided SSD for small-object detection , 2020, Applied Intelligence.

[17]  Tanzila Saba,et al.  A New Approach for Brain Tumor Segmentation and Classification Based on Score Level Fusion Using Transfer Learning , 2019, Journal of Medical Systems.

[18]  The Architectural Implications of Autonomous Driving: Constraints and Acceleration , 2018, ASPLOS.

[19]  Muhammad Attique Khan,et al.  Pixels to Classes: Intelligent Learning Framework for Multiclass Skin Lesion Localization and Classification , 2021, Comput. Electr. Eng..

[20]  Tanzila Saba,et al.  Region Extraction and Classification of Skin Cancer: A Heterogeneous framework of Deep CNN Features Fusion and Reduction , 2019, Journal of Medical Systems.

[21]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[22]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[23]  Suresh Chandra Satapathy,et al.  Gastrointestinal diseases segmentation and classification based on duo-deep architectures , 2020, Pattern Recognit. Lett..

[24]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[25]  Hassan Ghasemzadeh,et al.  Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.

[26]  Ali Farhadi,et al.  Label Refinery: Improving ImageNet Classification through Label Progression , 2018, ArXiv.

[27]  Yunchao Wei,et al.  Perceptual Generative Adversarial Networks for Small Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jiashi Feng,et al.  Distilling Object Detectors With Fine-Grained Feature Imitation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Abd El Rahman Shabayek,et al.  Deep network compression with teacher latent subspace learning and LASSO , 2020, Applied Intelligence.

[30]  Lingjia Tang,et al.  The Architectural Implications of Autonomous Driving: Constraints and Acceleration , 2018, ASPLOS.

[31]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Yongqiang Zhang,et al.  Weakly-supervised object detection via mining pseudo ground truth bounding-boxes , 2018, Pattern Recognit..

[33]  Seyed Iman Mirzadeh,et al.  Improved Knowledge Distillation via Teacher Assistant , 2020, AAAI.

[34]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Junjie Yan,et al.  Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Bernard Ghanem,et al.  Learning a strong detector for action localization in videos , 2019, Pattern Recognit. Lett..

[38]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Zheng Xu,et al.  Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks , 2017, ICLR.

[41]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Amjad Rehman,et al.  A Sustainable Deep Learning Framework for Object Recognition Using Multi-Layers Deep Features Fusion and Selection , 2020, Sustainability.

[44]  Bernard Ghanem,et al.  Finding Tiny Faces in the Wild with Generative Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[46]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[47]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[48]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[49]  Zhiqiang Shen,et al.  MEAL: Multi-Model Ensemble via Adversarial Learning , 2018, AAAI.

[50]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Robertas Damaševičius,et al.  A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection , 2020, Image Vis. Comput..

[52]  Tanzila Saba,et al.  A deep neural network and classical features based scheme for objects recognition: an application for machine inspection , 2020, Multimedia Tools and Applications.

[53]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[54]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[58]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[60]  Mudassar Raza,et al.  Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features , 2018, Multimedia Tools and Applications.

[61]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Imran Ashraf,et al.  StomachNet: Optimal Deep Learning Features Fusion for Stomach Abnormalities Classification , 2020, IEEE Access.

[64]  Muhammad Sharif,et al.  Attributes based skin lesion detection and recognition: A mask RCNN and transfer learning-based deep learning framework , 2021, Pattern Recognit. Lett..

[65]  Bernard Ghanem,et al.  W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Seifedine Kadry,et al.  Computer-Aided Gastrointestinal Diseases Analysis From Wireless Capsule Endoscopy: A Framework of Best Features Selection , 2020, IEEE Access.

[67]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[68]  Sutrisno Ibrahim,et al.  A comprehensive review on intelligent surveillance systems , 2016 .

[69]  Bernard Ghanem,et al.  Beyond Weakly Supervised: Pseudo Ground Truths Mining for Missing Bounding-Boxes Object Detection , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[70]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[71]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[73]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.