Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry

In this paper we show that a geometric representation of an object occurring in indoor scenes, along with rich scene structure can be used to produce a detector for that object in a single image. Using perspective cues from the global scene geometry, we first develop a 3D based object detector. This detector is competitive with an image based detector built using state-of-the-art methods; however, combining the two produces a notably improved detector, because it unifies contextual and geometric information. We then use a probabilistic model that explicitly uses constraints imposed by spatial layout - the locations of walls and floor in the image - to refine the 3D object estimates. We use an existing approach to compute spatial layout [1], and use constraints such as objects are supported by floor and can not stick through the walls. The resulting detector (a) has significantly improved accuracy when compared to the state-of-the-art 2D detectors and (b) gives a 3D interpretation of the location of the object, derived from a 2D image. We evaluate the detector on beds, for which we give extensive quantitative results derived from images of real scenes.

[1]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Carsten Rother,et al.  A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[4]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[5]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Antonio Torralba,et al.  Depth from Familiar Objects: A Hierarchical Model for 3D Scenes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[10]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[11]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[14]  Jitendra Malik,et al.  Inferring spatial layout from a single image via depth-ordered grouping , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Ashutosh Saxena,et al.  Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[16]  Luc Van Gool,et al.  Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  KeeChang Lee,et al.  Fast Automatic Single-View 3-d Reconstruction of Urban Scenes , 2008, ECCV.

[19]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.