Mobile Centernet for Embedded Deep Learning Object Detection

Object detection is a fundamental task in computer vision with wide application prospect. And recent years, many novel methods are proposed to tackle this task. However, most algorithms suffer from high computation cost and long inference time, which makes them impossible to be deployed on embedded devices in real industrial application scenarios. In this paper, we propose the Mobile CenterNet to solve this problem. Our method is based on CenterNet but with some key improvements. To enhance detection performance, we adopt HRNet as a powerful backbone and introduce a categorybalanced focal loss to deal with category imbalance problem. Moreover, to compress the model size as well as reduce inference time, knowledge distillation is employed to transfer knowledge from cumbersome model to a compact one. We conduct experiments on a large traffic detection dataset BDD100K and validate the effectiveness of all the modifications. Finally, our method achieves the 1st place in the Embedded Deep Learning Object Detection Model Compression Competition held in ICME 2020.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[4]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[5]  Yu Liu,et al.  POI: Multiple Object Tracking with High Performance Detection and Appearance Feature , 2016, ECCV Workshops.

[6]  Junjie Yan,et al.  Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[10]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[13]  Xiandong Meng,et al.  RefineDetLite: A Lightweight One-stage Object Detection Framework for CPU-only Devices , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[15]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[16]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[19]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[22]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.