Automatic Generation of Semantic Fields for Annotating Web Images

The overwhelming amounts of multimedia contents have triggered the need for automatically detecting the semantic concepts within the media contents. With the development of photo sharing websites such as Flickr, we are able to obtain millions of images with user-supplied tags. However, user tags tend to be noisy, ambiguous and incomplete. In order to improve the quality of tags to annotate web images, we propose an approach to build Semantic Fields for annotating the web images. The main idea is that the images are more likely to be relevant to a given concept, if several tags to the image belong to the same Semantic Field as the target concept. Semantic Fields are determined by a set of highly semantically associated terms with high tag co-occurrences in the image corpus and in different corpora and lexica such as WordNet and Wikipedia. We conduct experiments on the NUS-WIDE web image corpus and demonstrate superior performance on image annotation as compared to the state-of-the-art approaches.

[1]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[2]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[3]  Neil C. Rowe Inferring Depictions in Natural-Language Captions for Efficient Access to Picture Data , 1994, Inf. Process. Manag..

[4]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[7]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[8]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[9]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[10]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[11]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[12]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[13]  Richard Lehrer,et al.  Lexical Retrieval Processes: Semantic Field Effects , 2012 .

[14]  Gang Wang,et al.  Extracting Key Semantic Terms from Chinese Speech Query for Web Searches , 2003, ACL.

[15]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[16]  Gang Wang,et al.  Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video , 2008, ACM Multimedia.

[17]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[18]  Krystyna K. Matusiak Towards user-centered indexing in digital image collections , 2006, OCLC Syst. Serv..

[19]  Liang-Tien Chia,et al.  Understanding tag-cloud and visual features for better annotation of concepts in NUS-WIDE dataset , 2009, WSMC '09.

[20]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[21]  Wolfgang Nejdl,et al.  Can all tags be used for search? , 2008, CIKM '08.