Research on Small Size Object Detection in Complex Background

In object detection tasks, the detection of small size objects is very difficult since these small targets are always tightly grouped and interfered by background information. In order to solve this problem, we propose a novel network architecture based on YOLOv3 and a new feature fusion mechanism. We added multi-scale convolution kernels and differential receptive fields into YOLOv3 to extract the semantic features of the objects by using an Inception-like architecture. We also optimize the weights of feature fusion by selecting appropriate channel number ratios. Our model outperforms YOLOv3 when detecting small and easy clustering objects, such as airplane, bird, and person, and the detection speed is comparable with YOLOv3.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Qiang Chen,et al.  Network In Network , 2013, ICLR.