论文信息 - Multi-flow Sub-network and Multiple Connections for Single Shot Detection

Multi-flow Sub-network and Multiple Connections for Single Shot Detection

One-stage object detection methods are usually more computationally efficient than two-stage methods, which makes it more likely to be applied in practice. However, one-stage methods often suffer from lower detection accuracies, especially when the objects to be detected are small. In this paper, we propose a multi-flow sub-network and multiple connections for single shot detection (MSSD), which is built upon a one-stage strategy to inherit the computational efficiency and improve the detection accuracy. The multi-flow sub-network in MSSD aims to extract high quality feature maps with high spatial resolution, sufficient non-linear transformation, and multiple receptive fields, which facilitates detection of small objects in particular. In addition, MSSD uses multiple connections, including up-sampling, down-sampling, and resolution-invariant connections, to combine feature maps of different layers, which helps the model capture fine-grained details and improve feature representation. Extensive experiments on PASCAL VOC and MS COCO demonstrate that MSSD achieves competitive detection accuracy with high computational efficiency compared to state-of-the-art methods. MSSD with input size of 320\(\,\times \,\)320 achieves 80.6% mAP on VOC2007 at 45 FPS and 29.7% mAP on COCO, both with a Nvidia Titan X GPU.

[1] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[4] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[9] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10] Hanqing Lu,et al. CoupleNet: Coupling Global Structure with Local Parts for Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Dong-Qing Zhang,et al. Context-Aware Single-Shot Detector , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Zhiqiang Shen,et al. DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[15] Shifeng Zhang,et al. Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Nojun Kwak,et al. Enhancement of SSD by concatenating feature maps for object detection , 2017, BMVC.

[17] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[20] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Peiyun Hu,et al. Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Xiaochun Cao,et al. Joint Optic Disc and Cup Segmentation Based on Multi-Label Deep Network and Polar Transformation , 2018, IEEE Transactions on Medical Imaging.

[25] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).