Multi-Scale Real-Time Object Detection With Densely Connected Network

In this paper, we are interested in the object detection problem with high accuracy and fast prediction. To achieve this, we have modified the existing object detection algorithm, we design a new feature extraction network, predict detection target based on multiple feature maps and improve the anchor box. Particularly, since DenseNet can combine the output of previous convolutional layers by using skip connection operation to make prediction more accurate, we introduce DenseNet to improve the YOLOv3-Tiny network. Furthermore, we add a prediction layer to the improved network to detect more true targets. Finally, we clustered the ground truth of specific dataset to get the anchor box for training to improve the detection results of the improved network. We demonstrate the effectiveness of our network by using Pascal VOC datasets. The experimental results show that the improved network increases the detection accuracy by 14% and the detection speed is 63 frames per second in the video detection. The detection accuracy is improved to a higher level and the detection speed still meeting the real-time detection requirement, which shows that the proposed method is robust and effective. Anchor box, DenseNet, Video detection

[1]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[13]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  David A. Forsyth,et al.  30Hz Object Detection with DPM V5 , 2014, ECCV.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[17]  Andrea Vedaldi,et al.  R-CNN minus R , 2015, BMVC.

[18]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Junjie Yan,et al.  The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.