A Discriminative Model for Learning Semantic and Geometric Interactions in Indoor Scenes

Visual scene understanding is a difficult problem, interleaving object detection, geometric reasoning and scene classification. Consider the scene in Fig. 1.(a). A scene classifier will tell you, with some uncertainty, that this is a dining room [6, 3]. A layout estimator [5, 7] will tell you, with different uncertainty, how to fit a box to the room. An object detector [8, 4] will tell you, with large uncertainty, that there is a dining table and four chairs. Each algorithm provides important but uncertain and incomplete piece of information. This is because the scene is cluttered with objects which tend to occlude each other: the dining table occludes the chairs, the chairs occlude the dining table; all of these occlude the room layout components (i.e. the walls and floor).

[1]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[2]  Takeo Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, CVPR.

[3]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Silvio Savarese,et al.  Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).