An interactive approach to semantic modeling of indoor scenes with an RGBD camera

We present an interactive approach to semantic modeling of indoor scenes with a consumer-level RGBD camera. Using our approach, the user first takes an RGBD image of an indoor scene, which is automatically segmented into a set of regions with semantic labels. If the segmentation is not satisfactory, the user can draw some strokes to guide the algorithm to achieve better results. After the segmentation is finished, the depth data of each semantic region is used to retrieve a matching 3D model from a database. Each model is then transformed according to the image depth to yield the scene. For large scenes where a single image can only cover one part of the scene, the user can take multiple images to construct other parts of the scene. The 3D models built for all images are then transformed and unified into a complete scene. We demonstrate the efficiency and robustness of our approach by modeling several real-world scenes.

[1]  Reinhard Klein,et al.  Efficient RANSAC for Point‐Cloud Shape Detection , 2007, Comput. Graph. Forum.

[2]  David P. Dobkin,et al.  A search engine for 3D models , 2003, TOGS.

[3]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[4]  Daniel Huber,et al.  Using Context to Create Semantic 3D Models of Indoor Environments , 2010, BMVC.

[5]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[6]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Martin A. Riedmiller,et al.  On the Applicability of Unsupervised Feature Learning for Object Recognition in RGB-D Data , 2011 .

[8]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[9]  陈宝权 GlobFit: Consistently Fitting Primitives by Discovering Global Relations , 2011 .

[10]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  N. Mitra,et al.  GlobFit: consistently fitting primitives by discovering global relations , 2011, SIGGRAPH 2011.

[12]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[13]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[14]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[15]  Dieter Fox,et al.  Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation , 2010, Int. J. Robotics Res..

[16]  Wolfram Burgard,et al.  Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[17]  Ross T. Whitaker,et al.  Indoor scene reconstruction from sets of noisy range images , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[18]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[19]  Vladimir G. Kim,et al.  Shape-based recognition of 3D point clouds in urban environments , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Maneesh Agrawala,et al.  Interactive furniture layout using interior design guidelines , 2011, SIGGRAPH 2011.

[21]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[22]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[24]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[27]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Remco C. Veltkamp,et al.  A Survey of Content Based 3D Shape Retrieval Methods , 2004, SMI.

[30]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[31]  Jian Sun,et al.  Lazy snapping , 2004, SIGGRAPH 2004.

[32]  Justin Clark,et al.  Object Digitization for Everyone , 2011, Computer.

[33]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[35]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[37]  Thorsten Joachims,et al.  Contextually Guided Semantic Labeling and Search for 3D Point Clouds , 2011, ArXiv.

[38]  Leonidas J. Guibas,et al.  Acquiring 3D indoor environments with variability and repetition , 2012, ACM Trans. Graph..

[39]  Luc Van Gool,et al.  Real Time Head Pose Estimation from Consumer Depth Cameras , 2011, DAGM-Symposium.

[40]  Dieter Fox,et al.  Interactive 3D modeling of indoor environments with a consumer depth camera , 2011, UbiComp '11.

[41]  Harry Shum,et al.  Lazy snapping , 2004, ACM Trans. Graph..

[42]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[43]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[44]  Chi-Keung Tang,et al.  Make it home: automatic optimization of furniture arrangement , 2011, ACM Trans. Graph..