A Semantic Similarity Language Model to Improve Automatic Image Annotation

In recent years, with the rapid proliferation of digital images, the need to search and retrieve the images accurately, efficiently, and conveniently is becoming more acute. Automatic image annotation with image semantic content has attracted increasing attention, as it is the preprocess of annotation based image retrieval which provides users accurate, efficient, and convenient image retrieval with image understanding. Different machine learning approaches have been used to tackle the problem of automatic image annotation; however, most of them focused on exploring the relationship between images and annotation words and neglected the relationship among the annotation words. In this paper, we propose a framework of using language models to represent the word-to-word relation and thus to improve the performance of existing image annotation approaches utilizing probabilistic models. We also propose a specific language model - the semantic similarity language model to estimate the semantic similarity among the annotation words so that annotations that are more semantically coherent will have higher probability to be chosen to annotate the image. To illustrate the general idea of using language model to improve current image annotation systems, we added the language model on top of the two specific image annotation models - the translation model (TM) and the cross media relevance model (CMRM). We tested the improved models on a widely used image annotation corpus - the Corel 5K dataset. Our results show that by adding the semantic similarity language model, the performance of image annotation improves significantly in comparison with the original models. Our proposed language model can also be applied to other image annotation approaches using word probability conditioned on image or word-image joint probability as well.

[1]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[2]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[3]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[4]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[6]  Christopher Joseph Pal,et al.  Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[7]  James Ze Wang,et al.  Learning-based linguistic indexing of pictures with 2--d MHMMs , 2002, MULTIMEDIA '02.

[8]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Richard S. Zemel,et al.  Learning Hybrid Models for Image Annotation with Partially Labeled Data , 2008, NIPS.

[10]  Mirella Lapata,et al.  Language Models Based on Semantic Composition , 2009, EMNLP.

[11]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[12]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[13]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[14]  Luo Si,et al.  Effective automatic image annotation via a coherent language model and active learning , 2004, MULTIMEDIA '04.

[15]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[16]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[17]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[18]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.