Human-Centric Indoor Environment Modeling from Depth Videos

We propose an approach to model indoor environments from depth videos (the camera is stationary when recording the videos), which includes extracting the 3-D spatial layout of the rooms and modeling objects as 3-D cuboids. Different from previous work which purely relies on image appearance, we argue that indoor environment modeling should be human-centric: not only because humans are an important part of the indoor environments, but also because the interaction between humans and environments can convey much useful information about the environments. In this paper, we develop an approach to extract physical constraints from human poses and motion to better recover the spatial layout and model objects inside. We observe that the cues provided by human-environment intersection are very powerful: we don't have a lot of training data but our method can still achieve promising performance. Our approach is built on depth videos, which makes it more user friendly.

[1]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[4]  Stephen Gould,et al.  Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding , 2010, ECCV.

[5]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[6]  Takeo Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, CVPR.

[7]  Changhai Xu,et al.  Real-time indoor scene understanding using Bayesian filtering with motion cues , 2011, 2011 International Conference on Computer Vision.

[8]  Carsten Rother A new approach to vanishing point detection in architectural environments , 2002, Image Vis. Comput..

[9]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[10]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[11]  Dieter Fox,et al.  RGB-D object discovery via multi-scene analysis , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Jitendra Malik,et al.  Inferring spatial layout from a single image via depth-ordered grouping , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Honglak Lee,et al.  A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Zhi-Hua Zhou,et al.  Projection functions for eye detection , 2004, Pattern Recognit..

[16]  Jiwen Lu,et al.  Gait recognition for human identification based on ICA and fuzzy SVM through multiple views fusion , 2007, Pattern Recognit. Lett..

[17]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.