论文信息 - Detection-based object labeling in 3D scenes

Detection-based object labeling in 3D scenes

We propose a view-based approach for labeling objects in 3D scenes reconstructed from RGB-D (color+depth) videos. We utilize sliding window detectors trained from object views to assign class probabilities to pixels in every RGB-D frame. These probabilities are projected into the reconstructed 3D scene and integrated using a voxel representation. We perform efficient inference on a Markov Random Field over the voxels, combining cues from view-based detection and 3D shape, to label the scene. Our detection-based approach produces accurate scene labeling on the RGB-D Scenes Dataset and improves the robustness of object detection.

[1] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3] Miguel Á. Carreira-Perpiñán,et al. Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4] Ben Taskar,et al. Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5] Wolfram Burgard,et al. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7] Alexei A. Efros,et al. Improving Spatial Support for Objects via Multiple Segmentations , 2007, BMVC.

[8] Wolfram Burgard,et al. Instace-Based AMN Classification for Improved Object Recognition in 2D and 3D Laser Range Data , 2007, IJCAI.

[9] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Quoc V. Le,et al. High-accuracy 3D sensing for mobile manipulation: Improving object detection and door opening , 2009, 2009 IEEE International Conference on Robotics and Automation.

[11] Siddhartha S. Srinivasa,et al. Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[12] Paul Newman,et al. A generative framework for fast urban labeling using spatial and temporal context , 2009, Auton. Robots.

[13] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] James J. Little,et al. Multiple Viewpoint Recognition and Localization , 2010, ACCV.

[16] Nassir Navab,et al. Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Surya P. N. Singh,et al. A Pipeline for the Segmentation and Classification of 3D Point Clouds , 2010, ISER.

[18] Dieter Fox,et al. RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[19] Stephen Gould,et al. Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20] David G. Lowe,et al. Using stereo for object recognition , 2010, 2010 IEEE International Conference on Robotics and Automation.

[21] Dieter Fox,et al. Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation , 2010, Int. J. Robotics Res..

[22] Dieter Fox,et al. A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[23] Joshua B. Tenenbaum,et al. Learning to share visual appearance for multiclass object detection , 2011, CVPR 2011.

[24] Martial Hebert,et al. 3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25] D. Fox,et al. Classification and Semantic Mapping of Urban Environments , 2011, Int. J. Robotics Res..

[26] Gregory D. Hager,et al. Scene parsing using a prior world model , 2011, Int. J. Robotics Res..

[27] Dieter Fox,et al. A Scalable Tree-Based Approach for Joint Object and Pose Recognition , 2011, AAAI.

[28] Thorsten Joachims,et al. Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[29] Nicholas Roy,et al. Monte Carlo Pose Estimation with Quaternion Kernels and the Bingham Distribution , 2012 .