论文信息 - 3D-Based Reasoning with Blocks, Support, and Stability

3D-Based Reasoning with Blocks, Support, and Stability

3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.

[1] D. Baraff. Physically Based Modeling Rigid Body Simulation , 1992 .

[2] Alexei A. Efros,et al. Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[3] Yun Jiang,et al. Learning to place new objects in a scene , 2012, Int. J. Robotics Res..

[4] Jianxiong Xiao,et al. Localizing 3D cuboids in single-view images , 2012, NIPS.

[5] Ian D. Reid,et al. Manhattan scene understanding using monocular, stereo, and 3D features , 2011, 2011 International Conference on Computer Vision.

[6] Takeo Kanade,et al. Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[7] Thorsten Joachims,et al. Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[8] Alexei A. Efros,et al. Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[10] Chia-Tche Chang,et al. Fast oriented bounding box optimization on the rotation group SO(3,ℝ) , 2011, TOGS.

[11] John W. Fisher,et al. Efficient MCMC sampling with implicit shape representations , 2011, CVPR 2011.

[12] Yun Jiang,et al. Hallucinated Humans as the Hidden Context for Labeling 3D Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Jonathan T. Barron,et al. A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[14] Carsten Rother,et al. Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images , 2012, ECCV.

[15] Ashutosh Saxena,et al. Co-evolutionary predictors for kinematic pose inference from RGBD images , 2012, GECCO '12.

[16] Ashutosh Saxena,et al. Learning Depth from Single Monocular Images , 2005, NIPS.

[17] Hema Swetha Koppula,et al. Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[18] Dieter Fox,et al. A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19] Kun Zhou,et al. Interactive images , 2012, ACM Trans. Graph..

[20] David A. Forsyth,et al. Recovering free space of indoor scenes from a single image , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[22] Luc Van Gool,et al. What makes a chair a chair? , 2011, CVPR 2011.

[23] Jonathan T. Barron,et al. A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).