论文信息 - Led: Localization-Quality Estimation Embedded Detector

Led: Localization-Quality Estimation Embedded Detector

Classification subnetwork and box regression subnetwork are essential components in deep networks for object detection. However, we observe a contradiction that before NMS, some better localized detections do not correspond to higher classification confidences, and vice versa. This contradiction exists because classification confidences can not fully reflect the localization-quality (loc-quality) of each detection. In this work, we propose the Localization-quality Estimation embedded Detector abbreviated as LED, and a corresponding detection pipeline. In this detection pipeline, we first propose an accurate loc-quality estimation method for each detection, then combine the loc-quality with the corresponding classification confidence during inference to make each detection more reasonable and accurate. For efficiency, LED is designed as an one-stage network. Extensive experiments are conducted on Pascal VOC 2007 and KITTI car detection datasets to demonstrate the effectiveness of LED.

[1] Thierry Chateau,et al. Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Fuchun Sun,et al. RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Rogério Schmidt Feris,et al. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[9] Silvio Savarese,et al. Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[11] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[12] Kavita Bala,et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Thomas A. Funkhouser,et al. Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[17] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[18] Wei Liu,et al. DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[19] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Silvio Savarese,et al. Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.