Estimating spatial layout of rooms from RGB-D videos

Spatial layout estimation of indoor rooms plays an important role in many visual analysis applications such as robotics and human-computer interaction. While many methods have been proposed for recovering spatial layout of rooms in recent years, their performance is still far from satisfactory due to high occlusion caused by the presence of objects that clutter the scene. In this paper, we propose a new approach to estimate the spatial layout of rooms from RGB-D videos. Unlike most existing methods which estimate the layout from still images, RGB-D videos provide more spatial-temporal and depth information, which are helpful to improve the estimation performance because more contextual information can be exploited in RGB-D videos. Given a RGB-D video, we first estimate the spatial layout of the scene in each single frame and compute the camera trajectory using the simultaneous localization and mapping (SLAM) algorithm. Then, the estimated spatial layouts of different frames are integrated to infer temporally consistent layouts of the room throughout the whole video. Our method is evaluated on the NYU RGB-D dataset, and the experimental results show the efficacy of the proposed approach.

[1]  Anthony Cowley,et al.  Parsing Indoor Scenes Using RGB-D Imagery , 2012, Robotics: Science and Systems.

[2]  Wolfram Burgard,et al.  An evaluation of the RGB-D SLAM system , 2012, 2012 IEEE International Conference on Robotics and Automation.

[3]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[4]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[5]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[6]  Changhai Xu,et al.  Real-time indoor scene understanding using Bayesian filtering with motion cues , 2011, 2011 International Conference on Computer Vision.

[7]  James M. Coughlan,et al.  Manhattan World: Orientation and Outlier Detection by Bayesian Inference , 2003, Neural Computation.

[8]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[9]  Luis Salgado,et al.  Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.

[10]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.