Estimating scene typicality from human ratings and image features

Estimating scene typicality from human ratings and image features Krista A. Ehinger (kehinger@mit.edu) Department of Brain & Cognitive Sciences, MIT, 77 Massachusetts Ave. Cambridge, MA 02139 USA Jianxiong Xiao (jxiao@csail.mit.edu) Computer Science & Artificial Intelligence Laboratory, MIT, 77 Massachusetts Ave. Cambridge, MA 02139 USA Antonio Torralba (torralba@csail.mit.edu) Computer Science & Artificial Intelligence Laboratory, MIT, 77 Massachusetts Ave. Cambridge, MA 02139 USA Aude Oliva (oliva@mit.edu) Department of Brain & Cognitive Sciences, MIT, 77 Massachusetts Ave. Cambridge, MA 02139 USA Abstract Scenes, like objects, are visual entities that can be categorized into functional and semantic groups. One of the core concepts of human categorization is the idea that category membership is graded: some exemplars are more typical than others. Here, we obtain human typicality rankings for more than 120,000 images from 706 scene categories through an online rating task on Amazon Mechanical Turk. We use these rankings to identify the most typical examples of each scene category. Using computational models of scene classification based on global image features, we find that images which are rated as more typical examples of their category are more likely to be classified correctly. This indicates that the most typical scene examples contain the diagnostic visual features that are relevant for their categorization. Objectless, holistic representations of scenes might serve as a good basis for understanding how semantic categories are defined in term of perceptual representations. prototypes are an average or central tendency of all category members. People do not need to actually encounter the prototypical example of a category in order to form a concept of that category; instead, they extract the prototype through experience with the variation that exists within the category (Posner & Keele, 1968). Environmental scenes, like objects, are visual entities that can be organized in functional and semantic groups. Like other conceptual categories, scenes contain more and less typical exemplars. Tversky and Hemenway (1983) identified some typical examples of indoor and outdoor scene categories, but the total number of scene categories used in their studies was very small. Here, we extend the idea of scene typicality to a very large database containing over 700 scene categories. The goal of the current study is two-fold: first, to determine the prototypical exemplars that best represent each visual scene category; and second, to evaluate the performances of state-of-the-art global features algorithms at classifying different types of exemplars. Method Keywords: scene perception; prototypes; categorization. Introduction Dataset Most theories of categorization and concepts agree that category membership is graded – some items are more typical examples of their category than others. For example, both sparrows and ostriches are birds, but a sparrow is generally regarded as a much more typical bird than an ostrich. The more typical examples of a category show many advantages in cognitive tasks. For example, typical examples are more readily named than atypical examples when people are asked to list examples of a category (eg., furniture) and response times are faster for typical examples when people are asked to verify category membership (eg., “a chair is a piece of furniture”) (Rosch, 1975). According to Prototype Theory, concepts are represented by their most typical examples (Rosch, 1971). These Stimuli were taken from the SUN Database, a collection of 130,519 images organized into 899 categories (see Xiao, Hays, Ehinger, Oliva & Torralba, 2010). This database was constructed by first identifying all of the words in a dictionary corresponding to types of places, scenes, or environments (see Biederman, 1987, for a similar procedure with objects). Our definition of a scene or place type was any concrete common noun which could reasonably complete the phrase, “I am in a place,” or “Let’s go to the place.” We included terms which referred to specific subtypes of scenes or sub-areas of scenes. However, we excluded specific places (like MIT or New York), terms which did not evoke a clear visual identity (like workplace or territory), spaces which were too small for a human body

[1]  M. Posner,et al.  On the genesis of abstract ideas. , 1968, Journal of experimental psychology.

[2]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[3]  B. Tversky,et al.  Categories of environmental scenes , 1983, Cognitive Psychology.

[4]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[5]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[6]  M J Tarr,et al.  What Object Attributes Determine Canonical Views? , 1999, Perception.

[7]  Bernt Schiele,et al.  A Semantic Typicality Measure for Natural Scene Categorization , 2004, DAGM-Symposium.

[8]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[9]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Cordelia Schmid,et al.  Software - Histogram of oriented gradient object detection , 2006 .

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[14]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Michelle R. Greene,et al.  Recognition of natural scenes from global properties: Seeing the forest without representing the trees , 2009, Cognitive Psychology.

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jitendra Malik,et al.  When is scene recognition just texture recognition , 2010 .

[18]  Aude Oliva,et al.  Estimating perception of scene layout properties from global image features. , 2011, Journal of vision.

[19]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.