Adaptive Model for Integrating Different Types of Associated Texts for Automated Annotation of Web Images

A lot of texts are associated with Web images, such as image file name, ALT texts, surrounding texts etc on the corresponding Web pages. It is well known that the semantics of Web images are well correlated with these associated texts, and thus they can be used to infer the semantics of Web images. However, different types of associated texts may play different roles in deriving the semantics of Web contents. Most previous work either regard the associated texts as a whole, or assign fixed weights to different types of associated texts according to some prior knowledge or heuristics. In this paper, we propose a novel linear basic expansion-based approach to automatically annotate Web images based on their associated texts. In particular, we adaptively model the semantic contributions of different types of associated texts by using a piecewise penalty weighted regression model. We also demonstrate that we can leverage the social tagging data of Web images, such as the Flickr's Related Tags, to enhance the performance of Web image annotation. Experiments conducted on a real Web image data set demonstrate that our approach can significantly improve the performance of Web image annotation.

[1]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  Meng Wang,et al.  Structure-sensitive manifold ranking for video concept detection , 2007, ACM Multimedia.

[3]  Beng Chin Ooi,et al.  Giving meanings to WWW images , 2000, MM 2000.

[4]  Qi Zhang,et al.  Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching , 2007, CIVR '07.

[5]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[6]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[7]  Tat-Seng Chua,et al.  A bootstrapping framework for annotating and retrieving WWW images , 2004, MULTIMEDIA '04.

[8]  Jing Hua,et al.  Region-based Image Annotation using Asymmetrical Support Vector Machine-based Multiple-Instance Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Wei-Ying Ma,et al.  Bipartite graph reinforcement model for web image annotation , 2007, ACM Multimedia.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[12]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Wei-Ying Ma,et al.  Image annotation by large-scale content-based image retrieval , 2006, MM '06.

[14]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[15]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[17]  Vincent S. Tseng,et al.  Web image annotation by fusing visual features and textual information , 2007, SAC '07.

[18]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[19]  Mark D. Dunlop,et al.  Image retrieval by hypertext links , 1997, SIGIR '97.