A new object proposal generation method for object detection in RGB-D data

This paper proposes a modified selective search method that generates object proposals on RGB-D data in indoor scenes. The proposed method first applies color flattening to generate monotonous color variations in RGB image data. Then, from the color-flattened image and depth map data, cost function-based segment grouping and depth segmentation are applied to produce desirable segmentation results. Segment grouping using cost function on image data computes dissimilarities in color, texture, and size between two adjacent regions with pre-learned weights. Depth segmentation uses the height difference of grid cells in the binned depth grid map. The final set of object proposal regions extracted from the RGB image and depth map data is organized by considering the overlapping between two data modalities. Finally, the extracted set of object proposal regions is fed into AlexNet or VGG-16, both of which are widely used for object classification, to evaluate our method on object detection and classification tasks. The proposed segment-based method can precisely detect meaningful object regions using a smaller number of proposals than other methods. Further, its detection and classification performance are better than those of previous methods.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  J. Reitberger,et al.  Automatic car detection in high resolution urban scenes based on an adaptive 3D-model , 2003, 2003 2nd GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Bernt Schiele,et al.  New features and insights for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[8]  Christian Goerick,et al.  Artificial neural networks in real-time car detection and tracking applications , 1996, Pattern Recognit. Lett..

[9]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[10]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Bing-Fei Wu,et al.  Nighttime Vehicle Detection for Driver Assistance and Autonomous Vehicles , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Cristiano Premebida,et al.  Pedestrian detection combining RGB and dense LIDAR data , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[19]  Paul A. Viola,et al.  Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade , 2001, NIPS.

[20]  Song-Chun Zhu,et al.  Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model , 2014, ECCV.

[21]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[23]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[24]  Dariu Gavrila,et al.  Pedestrian Detection from a Moving Vehicle , 2000, ECCV.

[25]  Xiangyu Zhu,et al.  Object detection by labeling superpixels , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[28]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[30]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[31]  Liang Zhao,et al.  Stereo- and neural network-based pedestrian detection , 2000, IEEE Trans. Intell. Transp. Syst..

[32]  Antonio M. López,et al.  Road Detection Based on Illuminant Invariance , 2011, IEEE Transactions on Intelligent Transportation Systems.

[33]  Yizhou Yu,et al.  An L1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition , 2015, ACM Trans. Graph..