Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks

Object proposals have recently emerged as an essential cornerstone for object detection. The current state-of-the-art object detectors employ object proposals to detect objects within a modest set of candidate bounding box proposals instead of exhaustively searching across an image using the sliding window approach. However, achieving high recall and good localization with few proposals is still a challenging problem. The challenge becomes even more difficult in the context of autonomous driving, in which small objects, occlusion, shadows, and reflections usually occur. In this paper, we present a robust object proposals re-ranking algorithm that effectivity re-ranks candidates generated from a customized class-independent 3DOP (3D Object Proposals) method using a two-stream convolutional neural network (CNN). The goal is to ensure that those proposals that accurately cover the desired objects are amongst the few top-ranked candidates. The proposed algorithm, which we call DeepStereoOP, exploits not only RGB images as in the conventional CNN architecture, but also depth features including disparity map and distance to the ground. Experiments show that the proposed algorithm outperforms all existing object proposal algorithms on the challenging KITTI benchmark in terms of both recall and localization. Furthermore, the combination of DeepStereoOP and Fast R-CNN achieves one of the best detection results of all three KITTI object classes. HighlightsWe present a robust object proposals re-ranking algorithm for object detection in autonomous driving.Both RGB images and depth features are included in the proposed two-stream CNN architecture called DeepStereoOP.Initial object proposals are generated from a customized class-independent 3DOP method.Experiments show that the proposed algorithm outperforms all existing object proposals algorithms.The combination of DeepStereoOP and Fast R-CNN achieves one of the best detection results on KITTI benchmark.

[1]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[5]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[6]  Mohan M. Trivedi,et al.  Learning to Detect Vehicles by Clustering Appearance Patterns , 2015, IEEE Transactions on Intelligent Transportation Systems.

[7]  Zehang Sun,et al.  On-road vehicle detection: a review , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Peter V. Gehler,et al.  Multi-View and 3D Deformable Part Models , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  BebisGeorge,et al.  On-Road Vehicle Detection , 2006 .

[10]  Jae Wook Jeon,et al.  Robust non-local stereo matching for outdoor driving images using segment-simple-tree , 2015, Signal Process. Image Commun..

[11]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[12]  Cewu Lu,et al.  Contour Box: Rejecting Object Proposals without Explicit Closed Contours , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Peter V. Gehler,et al.  Occlusion Patterns for Object Class Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[16]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[17]  Sven J. Dickinson,et al.  Learning to Combine Mid-Level Cues for Object Proposal Generation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[20]  Jitendra Malik,et al.  DeepBox: Learning Objectness with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Jonathan Warrell,et al.  Proposal generation for object detection using cascaded ranking SVMs , 2011, CVPR 2011.

[22]  Andreas Geiger,et al.  Joint 3D Estimation of Objects and Scene Layout , 2011, NIPS.

[23]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[24]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[25]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[30]  Luis Miguel Bergasa,et al.  Supervised learning and evaluation of KITTI's cars detector with DPM , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[31]  Matthew B. Blaschko,et al.  Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[32]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Luc Van Gool,et al.  Seeking the Strongest Rigid Detector , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Song-Chun Zhu,et al.  Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model , 2014, ECCV.

[35]  Esa Rahtu,et al.  Generating Object Segmentation Proposals Using Global and Local Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Sanja Fidler,et al.  segDeepM: Exploiting segmentation and context in deep neural networks for object detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Huimin Ma,et al.  Improving object proposals with multi-thresholding straddling expansion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  TrivediMohan Manubhai,et al.  Looking at Vehicles on the Road , 2013 .

[41]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Derek Hoiem,et al.  Category-Independent Object Proposals with Diverse Ranking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Baolin Yin,et al.  Cracking BING and Beyond , 2014, BMVC.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Francis Eng Hock Tay,et al.  Scale-Aware Pixelwise Object Proposal Networks , 2016, IEEE Transactions on Image Processing.

[46]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[50]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Venkatesh Saligrama,et al.  BING++: A Fast High Quality Object Proposal Generator at 100fps , 2015, ArXiv.

[53]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Fatih Murat Porikli,et al.  Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework , 2015, IEEE Transactions on Intelligent Transportation Systems.

[55]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Luc Van Gool,et al.  DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[58]  Mohan M. Trivedi,et al.  Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis , 2013, IEEE Transactions on Intelligent Transportation Systems.

[59]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[60]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Xiaogang Wang,et al.  Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[62]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[63]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65]  James M. Rehg,et al.  RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[67]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[68]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[69]  Vladlen Koltun,et al.  Learning to propose objects , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[71]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[72]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Bernt Schiele,et al.  How good are detection proposals, really? , 2014, BMVC.