论文信息 - Contextually guided semantic labeling and search for three-dimensional point clouds

Contextually guided semantic labeling and search for three-dimensional point clouds

RGB-D cameras, which give an RGB image together with depths, are becoming increasingly popular for robotic perception. In this paper, we address the task of detecting commonly found objects in the three-dimensional (3D) point cloud of indoor scenes obtained from such cameras. Our method uses a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurrence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. We train the model using a maximum-margin learning approach. In our experiments concerning a total of 52 3D scenes of homes and offices (composed from about 550 views), we get a performance of 84.06% and 73.38% in labeling office and home scenes respectively for 17 object classes each. We also present a method for a robot to search for an object using the learned model and the contextual information available from the current labelings of the scene. We applied this algorithm successfully on a mobile robot for the task of finding 12 object classes in 10 different offices and achieved a precision of 97.56% with 78.43% recall.1

[1] Ashutosh Saxena,et al. Co-evolutionary predictors for kinematic pose inference from RGBD images , 2012, GECCO '12.

[2] Richard Szeliski,et al. A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Vladimir Kolmogorov,et al. Optimizing Binary MRFs via Extended Roof Duality , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Ben Taskar,et al. Learning associative Markov networks , 2004, ICML.

[6] Thorsten Joachims,et al. Labeling 3D scenes for Personal Assistant Robots , 2011, ArXiv.

[7] Thorsten Joachims,et al. Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[8] Vladimir G. Kim,et al. Shape-based recognition of 3D point clouds in urban environments , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] Roman Shapovalov,et al. Cutting-Plane Training of Non-associative Markov Network for 3D Point Cloud Segmentation , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[10] Andrew Y. Ng,et al. Integrating Visual and Range Data for Robotic Object Detection , 2008, ECCV 2008.

[11] Tal Arbel,et al. Efficient Discriminant Viewpoint Selection for Active Bayesian Recognition , 2006, International Journal of Computer Vision.

[12] Tal Arbel,et al. A fast discriminant approach to active object recognition and pose estimation , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13] Siddhartha S. Srinivasa,et al. Structure discovery in multi-modal data: A region-based approach , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14] Tsuhan Chen,et al. Toward Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Ashutosh Saxena,et al. 3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[16] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[17] Nico Blodow,et al. Towards 3D Point cloud based object maps for household environments , 2008, Robotics Auton. Syst..

[18] Ashutosh Saxena,et al. Learning the right model: Efficient max-margin learning in Laplacian CRFs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Antonio Torralba,et al. Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[20] Pittsburgh,et al. The MOPED framework: Object recognition and pose estimation for manipulation , 2011 .

[21] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[22] Ales Leonardis,et al. A framework for visual-context-aware object detection in still images , 2010, Comput. Vis. Image Underst..

[23] Joachim Denzler,et al. Information Theoretic Sensor Data Selection for Active Object Recognition and State Estimation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24] Quoc V. Le,et al. High-accuracy 3D sensing for mobile manipulation: Improving object detection and door opening , 2009, 2009 IEEE International Conference on Robotics and Automation.

[25] Alexei A. Efros,et al. Putting Objects in Perspective , 2006, CVPR.

[26] Pierre Hansen,et al. Roof duality, complementation and persistency in quadratic 0–1 optimization , 1984, Math. Program..

[27] Martial Hebert,et al. Classifier fusion for outdoor obstacle detection , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[28] Tsuhan Chen,et al. $\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding , 2011, NIPS.

[29] Ashutosh Saxena,et al. Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[30] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[31] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[32] Dieter Fox,et al. A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[33] Luc Van Gool,et al. Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34] D. Fox,et al. Classification and Semantic Mapping of Urban Environments , 2011, Int. J. Robotics Res..

[35] James J. Little,et al. Viewpoint detection models for sequential embodied object category recognition , 2010, 2010 IEEE International Conference on Robotics and Automation.

[36] David A. Forsyth,et al. Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[37] Takeo Kanade,et al. Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[38] Jianxiong Xiao,et al. Multiple view semantic segmentation for street view images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39] Martial Hebert,et al. Natural terrain classification using three‐dimensional ladar data for ground robot mobility , 2006, J. Field Robotics.

[40] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41] Joel W. Burdick,et al. A probabilistic framework for object search with 6-DOF pose estimation , 2011, Int. J. Robotics Res..

[42] Martial Hebert,et al. 3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[43] Ashutosh Saxena,et al. Learning Depth from Single Monocular Images , 2005, NIPS.

[44] Antonio Torralba,et al. Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[45] O. Barinova,et al. NON-ASSOCIATIVE MARKOV NETWORKS FOR 3D POINT CLOUD CLASSIFICATION , 2010 .

[46] Endre Boros,et al. Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[47] Martial Hebert,et al. Onboard contextual classification of 3-D point clouds with learned high-order Markov Random Fields , 2009, 2009 IEEE International Conference on Robotics and Automation.

[48] Yun Jiang,et al. Learning to place new objects in a scene , 2012, Int. J. Robotics Res..

[49] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[50] Daniel Huber,et al. Using Context to Create Semantic 3D Models of Indoor Environments , 2010, BMVC.

[51] Tsuhan Chen,et al. Robotic Object Detection: Learning to Improve the Classifiers Using Sparse Graphs for Path Planning , 2011, IJCAI.

[52] Bart Selman,et al. Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[53] Thorsten Joachims,et al. Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[54] Ben Taskar,et al. Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55] Dieter Fox,et al. Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[56] Alexei A. Efros,et al. An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57] Daphne Koller,et al. Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.