Improve Image Annotation by Combining Multiple Models

Automatic image annotation is a promising methodology for image retrieval. However most current annotation models are not yet sophisticated enough to produce high quality annotations. Given an image, some irrelevant keywords to image contents are produced, which are a primary obstacle to getting high-quality image retrieval. In this paper an approach is proposed to improve automatic image annotation two directions. One is to combine annotation keywords produced by underlying three classic image annotation models of translation model, continuous-space relevance model and multiple Bernoulli relevance models, hoping to increase the number of potential correctly annotated keywords. Another is to remove irrelevant keywords to image semantics based on semantic similarity calculation using WordNet. To verify the proposed hybrid annotation model, we carried out the experiments on the widely used Corel image data set, and the reported experimental results showed that the proposed approach improved image annotation to some extent.

[1]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[4]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[5]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[6]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[7]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[8]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[9]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[10]  Aidong Zhang,et al.  SemQuery: Semantic Clustering and Querying on Heterogeneous Features for Visual Data , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[12]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Sharad Mehrotra,et al.  WebMARS: a multimedia search engine , 1999, Electronic Imaging.

[15]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[16]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[17]  R. Manmatha,et al.  Statistical models for automatic video annotation and retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[19]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[20]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[21]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[22]  S. Sclaroff,et al.  ImageRover: a content-based image browser for the World Wide Web , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[23]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Shi-Kuo Chang,et al.  Image Information Systems: Where Do We Go From Here? , 1992, IEEE Trans. Knowl. Data Eng..

[27]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[29]  Hideyuki Tamura,et al.  Image database systems: A survey , 1984, Pattern Recognit..

[30]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.