International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval

In this paper, we present a novel image representation that renders it possible to access natural scenes by local semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic concept classes such as water, rocks, or foliage. Images are represented through the frequency of occurrence of these local concepts. Through extensive experiments, we demonstrate that the image representation is well suited for modeling the semantic content of heterogenous scene categories, and thus for categorization and retrieval.The image representation also allows us to rank natural scenes according to their semantic similarity relative to certain scene categories. Based on human ranking data, we learn a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking. This result is especially valuable for content-based image retrieval where the goal is to present retrieval results in descending semantic similarity from the query.

[1]  E. Rosch,et al.  Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[2]  E. Rosch,et al.  Structural bases of typicality effects. , 1976 .

[3]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[4]  B. Tversky,et al.  Categories of environmental scenes , 1983, Cognitive Psychology.

[5]  P. Kline The handbook of psychological testing, 2nd ed. , 1993 .

[6]  J. Bortz Statistik: Fur Sozialwissenschaftler , 1993 .

[7]  Rangachar Kasturi,et al.  Machine vision , 1995 .

[8]  Rosalind W. Picard,et al.  Interactive Learning Using a "Society of Models" , 2017, CVPR 1996.

[9]  Tom Minka,et al.  Interactive learning with a "society of models" , 1997, Pattern Recognit..

[10]  W. Eric L. Grimson,et al.  Configuration based scene classification and image indexing , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[13]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[14]  Charles A. Bouman,et al.  Perceptual image similarity experiments , 1998, Electronic Imaging.

[15]  Aude Oliva,et al.  Global semantic classification of scenes using power spectrum templates , 1999 .

[16]  Remco C. Veltkamp,et al.  Features in Content-based Image Retrieval Systems: a Survey , 1999, State-of-the-Art in Content-Based Image and Video Retrieval.

[17]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[18]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Remco C. Veltkamp,et al.  Content-based image retrieval systems: A survey , 2000 .

[20]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[21]  Yongmei Wang,et al.  Content-based image orientation detection with support vector machines , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[22]  Nando de Freitas,et al.  Object Recognition as Machine Translation – Part 2: Exploiting Image Database Clustering Models , 2001 .

[23]  Hans-Peter Kriegel,et al.  State-of-the-Art in Content-Based Image and Video Retrieval , 2001, Computational Imaging and Vision.

[24]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[25]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[26]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[28]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[29]  Antonio Torralba,et al.  Scene-Centered Description from Spatial Envelope Properties , 2002, Biologically Motivated Computer Vision.

[30]  Nicu Sebe,et al.  Challenges of Image and Video Retrieval , 2002, CIVR.

[31]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[32]  Xia Feng,et al.  Color photo categorization using compressed histograms and support vector machines , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[33]  Nicu Sebe,et al.  The State of the Art in Image and Video Retrieval , 2003, CIVR.

[34]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[35]  Martial Hebert,et al.  Man-made structure detection in natural images using a causal multiscale random field , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[36]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Julia Vogel,et al.  Semantic scene modeling and retrieval , 2004 .

[38]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[39]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[40]  Jiebo Luo,et al.  Improved scene classification using efficient low-level features and semantic cues , 2004, Pattern Recognit..

[41]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[42]  Aleksandra Mojsilovic,et al.  Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues , 2004, International Journal of Computer Vision.

[43]  Tom Minka,et al.  Vision texture for annotation , 1995, Multimedia Systems.

[44]  Bernt Schiele,et al.  A psychophysically plausible model for typicality ranking of natural scenes , 2006, TAP.