Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

[1]  M. Potter Meaning in visual search. , 1975, Science.

[2]  E. Rosch,et al.  Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[3]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Jeffrey A. Sloan,et al.  Spatial frequency analysis of the visual environment: Anisotropy and the carpentered environment hypothesis , 1978, Vision Research.

[5]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[6]  A. Friedman Framing pictures: the role of knowledge in automatized encoding and memory for gist. , 1979, Journal of experimental psychology. General.

[7]  L N Piotrowski,et al.  A Demonstration of the Visual Importance and Flexibility of Spatial-Frequency Amplitude and Phase , 1982, Perception.

[8]  B. Tversky,et al.  Categories of environmental scenes , 1983, Cognitive Psychology.

[9]  Donald M. Craig,et al.  Review of "Reviews Vision, by David Marr" San Franclsco: WH Freeman, 1982 , 1983, ASTR.

[10]  Alex Pentland,et al.  Fractal-Based Description of Natural Scenes , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[12]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[13]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[14]  Robert King,et al.  Textural features corresponding to textural properties , 1989, IEEE Trans. Syst. Man Cybern..

[15]  Leslie S. Smith,et al.  The principal components of natural images , 1992 .

[16]  Joseph J. Atick,et al.  What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[17]  A. Ravishankar Rao,et al.  Identifying High Level Features of Texture Perception , 1993, CVGIP Graph. Model. Image Process..

[18]  A. Oliva,et al.  From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition , 1994 .

[19]  Rosalind W. Picard,et al.  Texture orientation for sorting photos "at a glance" , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[20]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[21]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[22]  J. V. van Hateren,et al.  Modelling the power spectra of natural images: statistics and information. , 1996, Vision research.

[23]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  J. H. van Hateren,et al.  Modelling the Power Spectra of Natural Images: Statistics and Information , 1996, Vision Research.

[25]  Paul A. Viola,et al.  Structure Driven Image Database Retrieval , 1997, NIPS.

[26]  W. Epstein,et al.  Priming Spatial Layout of Scenes , 1997 .

[27]  A. Oliva,et al.  Coarse Blobs or Fine Edges? Evidence That Information Diagnosticity Changes the Perception of Complex Visual Stimuli , 1997, Cognitive Psychology.

[28]  Ronald A. Rensink,et al.  TO SEE OR NOT TO SEE: The Need for Attention to Perceive Changes in Scenes , 1997 .

[29]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Roland Baddeley,et al.  The Correlational Structure of Natural Images and the Calibration of Spatial Representations , 1997, Cogn. Sci..

[31]  Serge J. Belongie,et al.  Region-based image querying , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[32]  W. Eric L. Grimson,et al.  Configuration based scene classification and image indexing , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[34]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[35]  J. Henderson,et al.  High-level scene perception. , 1999, Annual review of psychology.

[36]  Aude Oliva,et al.  Global semantic classification of scenes using power spectrum templates , 1999 .

[37]  C. Heaps,et al.  Similarity and Features of Natural Textures , 1999 .

[38]  Antonio Torralba,et al.  Semantic organization of scenes using discriminant structural templates , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[39]  Ronald A. Rensink,et al.  Change-blindness as a result of ‘mudsplashes’ , 1999, Nature.

[40]  Anil K. Jain,et al.  Content-based hierarchical classification of vacation images , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[41]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[42]  Aude Oliva,et al.  Classification of scene photographs from local orientations features , 2000, Pattern Recognit. Lett..

[43]  T. Sanocki,et al.  Does Figural Goodness Influence the Processing and Representation of SpatialLayout , 2000 .

[44]  A. Oliva,et al.  Diagnostic Colors Mediate Scene Recognition , 2000, Cognitive Psychology.

[45]  Antonio Torralba,et al.  Statistical Context Priming for Object Detection , 2001, ICCV.

[46]  M. J. Morgan,et al.  The relative importance of local phase and local amplitude in patchwise image reconstruction , 1991, Biological Cybernetics.

[47]  Michael S. Ambinder,et al.  Change blindness , 1997, Trends in Cognitive Sciences.