Recovering free space of indoor scenes from a single image

In this paper we consider the problem of recovering the free space of an indoor scene from its single image. We show that exploiting the box like geometric structure of furniture and constraints provided by the scene, allows us to recover the extent of major furniture objects in 3D. Our “boxy” detector localizes box shaped objects oriented parallel to the scene across different scales and object types, and thus blocks out the occupied space in the scene. To localize the objects more accurately in 3D we introduce a set of specially designed features that capture the floor contact points of the objects. Image based metrics are not very indicative of performance in 3D. We make the first attempt to evaluate single view based occupancy estimates for 3D errors and propose several task driven performance measures towards it. On our dataset of 592 indoor images marked with full 3D geometry of the scene, we show that: (a) our detector works well using image based metrics; (b) our refinement method produces significant improvements in localization in 3D; and (c) if one evaluates using 3D metrics, our method offers major improvements over other single view based scene geometry estimation methods.

[1]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Alan L. Yuille,et al.  Manhattan World: compass direction from a single image by Bayesian inference , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[5]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[6]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[8]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[9]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[10]  Joseph Schlecht,et al.  Sampling bedrooms , 2011, CVPR 2011.

[11]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[12]  Alexei A. Efros,et al.  Opportunistic Use of Vision to Push Back the Path-Planning Horizon , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[14]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[15]  Jitendra Malik,et al.  Inferring spatial layout from a single image via depth-ordered grouping , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[17]  David A. Forsyth,et al.  Rendering synthetic objects into legacy photographs , 2011, ACM Trans. Graph..

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Carsten Rother,et al.  A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[20]  KeeChang Lee,et al.  Fast Automatic Single-View 3-d Reconstruction of Urban Scenes , 2008, ECCV.