论文信息 - Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry

Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry

In this paper we show that a geometric representation of an object occurring in indoor scenes, along with rich scene structure can be used to produce a detector for that object in a single image. Using perspective cues from the global scene geometry, we first develop a 3D based object detector. This detector is competitive with an image based detector built using state-of-the-art methods; however, combining the two produces a notably improved detector, because it unifies contextual and geometric information. We then use a probabilistic model that explicitly uses constraints imposed by spatial layout - the locations of walls and floor in the image - to refine the 3D object estimates. We use an existing approach to compute spatial layout [1], and use constraints such as objects are supported by floor and can not stick through the walls. The resulting detector (a) has significantly improved accuracy when compared to the state-of-the-art 2D detectors and (b) gives a 3D interpretation of the location of the object, derived from a 2D image. We evaluate the detector on beds, for which we give extensive quantitative results derived from images of real scenes.

[1] Tomaso A. Poggio,et al. Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Takeo Kanade,et al. Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Carsten Rother,et al. A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[4] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[5] Zhuowen Tu,et al. Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7] Antonio Torralba,et al. Depth from Familiar Objects: A Hierarchical Model for 3D Scenes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8] Alexei A. Efros,et al. Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[10] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[11] Ashutosh Saxena,et al. Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Derek Hoiem,et al. 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Andrew J. Davison,et al. Active Matching , 2008, ECCV.

[14] Jitendra Malik,et al. Inferring spatial layout from a single image via depth-ordered grouping , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15] Ashutosh Saxena,et al. Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[16] Luc Van Gool,et al. Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Alexei A. Efros,et al. Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18] KeeChang Lee,et al. Fast Automatic Single-View 3-d Reconstruction of Urban Scenes , 2008, ECCV.

[19] T. Kanade,et al. Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Derek Hoiem,et al. Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21] Stephen Gould,et al. Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Silvio Savarese,et al. Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24] Alexei A. Efros,et al. Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.