High-Level Features for Image Indexing and Retrieval

Image indexing for content-based image retrieval is the process of automatically computing a compact representation (numerical or alphanumerical) of some attribute of digital images, to be used to derive information about the image contents. A feature, or attribute, can be related to a visual characteristic, but it may also be related to an interpretative response to an image or to a spatial, symbolic, semantic, or emotional characteristic. A feature may relate to a single attribute or be a composite representation of different attributes. Features can be classified as general purpose or domain-dependent. The general purpose features can be used in any context, while the domain-dependent features are designed specifically for a given application. Every feature is intimately tied with the kind of information that it captures. The choice of a particular feature over another depends on the given application, and the kind (level) of information required. While more complex and sophisticated general purpose and domain dependent features were being developed, two important issues became evident: the sensory gap and the semantic gap. The sensory gap is the gap between the information of the real world, and the information in a computational description derived from a digital recording of a scene of the world (i.e. low-level features or visual features such as color, texture, etc.). The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given application. As stated by Smeulders et al. (2000), the gap between the pictorial features and the image’s semantics makes it difficult for purely low-level content-based retrieval systems to obtain satisfactory results. High-level features try to bridge the semantic gap by embedding in their representation information about the image content. This goal can be pursued by different means from manually annotating the images with texts to exploiting pattern recognition, computer vision and machine learning algorithms.

[1]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[2]  Raimondo Schettini,et al.  Quicklook2: An Integrated Multimedia System , 2001, J. Vis. Lang. Comput..

[3]  A. Ravishankar Rao,et al.  Towards a texture naming system: Identifying relevant dimensions of texture , 1993, Vision Research.

[4]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[5]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[6]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[7]  Raimondo Schettini,et al.  Halfway through the semantic gap: Prosemantic features for image retrieval , 2011, Inf. Sci..

[8]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[9]  Simone Santini,et al.  With a little help from my friends , 2012, Multimedia Tools and Applications.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Nuno Vasconcelos,et al.  Scene classification with low-dimensional semantic spaces and weak supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[13]  Yueting Zhuang,et al.  Apply semantic template to support content-based image retrieval , 1999, Electronic Imaging.