Improvement of Non-Maximum Suppression in RGB-D Object Detection

Currently, the non-maximum suppression (NMS) algorithm is a commonly used method in the post-processing stage of object detection. However, the NMS algorithm cannot effectively eliminate missing and false object detection results because of the simple constraint condition. To solve the problem of the poor detection effect in highly overlapping dense object scenes in the traditional NMS algorithm, we design an RGB-D object detection network model based on the YOLO v3 framework, and using level-by-level metaphase fusion on the RGB and depth information, we propose an improved NMS algorithm which fuses depth characteristics. According to the depth of the object in the detection boxes, it is determined whether another object is the same object in highly overlapping detection boxes, and the average depth of the internal pixels in the detection boxes is calculated as a penalty term, then the penalty term is added to the detection box score to obtain a new constraint condition for non-maximum suppression. The experimental results on the NYU Depth V2 dataset show that the mean average precision (mAP) of the Depth Fusion NMS algorithm proposed in this paper is 0.8%, 0.5% and 0.3% higher than those of the Greedy-NMS, Soft NMS-L and Soft NMS-G methods, respectively. After comparison and analysis, our method can not only detect more overlapping objects but also achieve a better object localization accuracy.

[1]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[2]  Xiangyu Zhang,et al.  Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection , 2018, ArXiv.

[3]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jiwen Lu,et al.  Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition , 2016, ArXiv.

[6]  Markus Vincze,et al.  Recurrent Convolutional Fusion for RGB-D Object Recognition , 2018, IEEE Robotics and Automation Letters.

[7]  Jiebo Luo,et al.  Multi-modal deep feature learning for RGB-D object detection , 2017, Pattern Recognit..

[8]  Feng Ran,et al.  Improvement of Non-maximum Suppression in Pedestrian Detection Based on HOG Features , 2016 .

[9]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[10]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Qing Liu,et al.  Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Zhenbao Liu,et al.  Vehicle Detection in Aerial Images Using Rotation-Invariant Cascaded Forest , 2019, IEEE Access.

[13]  Wenqing Zhao,et al.  Penalty Non-maximum Suppression in Object Detection , 2018, PRCV.

[14]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[15]  Yiping Yang,et al.  Rotated region based CNN for ship detection , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[16]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[17]  Yaxiang Fan,et al.  Accurate non-maximum suppression for object detection in high-resolution remote sensing images , 2018 .

[18]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[19]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[20]  Marc Pollefeys,et al.  Multimodal Neural Networks: RGB-D for Semantic Segmentation and Object Detection , 2017, SCIA.