Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-the-art as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.

[1]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[2]  Ronen Basri,et al.  Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[5]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[8]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[9]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[10]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[12]  Trevor Darrell,et al.  Size Matters: Metric Visual Search Constraints from Monocular Metadata , 2010, NIPS.

[13]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[14]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[15]  Pietro Perona,et al.  Object detection and segmentation from joint embedding of parts and pixels , 2011, 2011 International Conference on Computer Vision.

[16]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[17]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[18]  Wolfram Burgard,et al.  Point feature extraction on 3D range scans taking into account object boundaries , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[20]  Silvio Savarese,et al.  Toward coherent object detection and scene layout understanding , 2011, Image Vis. Comput..

[21]  Daniel P. Huttenlocher,et al.  Distance Transforms of Sampled Functions , 2012, Theory Comput..

[22]  Silvio Savarese,et al.  Toward coherent object detection and scene layout understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Silvio Savarese,et al.  Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Peter V. Gehler,et al.  3D2PM - 3D Deformable Part Models , 2012, ECCV.

[25]  Silvio Savarese,et al.  Relating Things and Stuff by High-Order Potential Modeling , 2012, ECCV Workshops.

[26]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[28]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.