Automatic image annotation by using concept-sensitive salient objects for image content representation

Multi-level annotation of images is a promising solution to enable more effective semantic image retrieval by using various keywords at different semantic levels. In this paper, we propose a multi-level approach to annotate the semantics of natural scenes by using both the dominant image components and the relevant semantic concepts. In contrast to the well-known image-based and region-based approaches, we use the salient objects as the dominant image components to achieve automatic image annotation at the content level. By using the salient objects for image content representation, a novel image classification technique is developed to achieve automatic image annotation at the concept level. To detect the salient objects automatically, a set of detection functions are learned from the labeled image regions by using Support Vector Machine (SVM) classifiers with an automatic scheme for searching the optimal model parameters. To generate the semantic concepts, finite mixture models are used to approximate the class distributions of the relevant salient objects. An adaptive EM algorithm has been proposed to determine the optimal model structure and model parameters simultaneously. We have also demonstrated that our algorithms are very effective to enable multi-level annotation of natural scenes in a large-scale dataset.

[1]  Clement T. Yu,et al.  Using semantic contents and WordNet in image retrieval , 1997, SIGIR '97.

[2]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[3]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[4]  John R. Smith,et al.  Image Classification and Querying Using Composite Region Templates , 1999, Comput. Vis. Image Underst..

[5]  Wei-Ying Ma,et al.  Learning and inferring a semantic space from user's relevance feedback for image retrieval , 2002, MULTIMEDIA '02.

[6]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[7]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[8]  Stan Z. Li,et al.  View-based clustering of object appearances based on independent subspace analysis , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[10]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[11]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[12]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[13]  Serge J. Belongie,et al.  Region-based image querying , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[14]  Antonio Torralba,et al.  Semantic organization of scenes using discriminant structural templates , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  W. Eric L. Grimson,et al.  Spatial template extraction for image retrieval by region matching , 2003, IEEE Trans. Image Process..

[17]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Neill W. Campbell,et al.  Automatic Segmentation and Classification of Outdoor Images Using Neural Networks , 1997, Int. J. Neural Syst..

[19]  Aleksandra Mojsilovic,et al.  ISee: perceptual features for image library navigation , 2002, IS&T/SPIE Electronic Imaging.

[20]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[21]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[23]  Qiang Yang,et al.  A Unified Semantics and Feature Based Image Retrieval Technique Using Relevance Feedback , 2000 .

[24]  R. Manmatha,et al.  Automatic segmentation and indexing in a database of bird images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[25]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[26]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[27]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..