Visual tag dictionary: interpreting tags with visual words

Visual-word based image representation has shown effectiveness in a wide variety of applications such as categorization, annotation and search. By detecting keypoints in images and treating their patterns as visual words, an image can be represented as a bag of visual words, which is analogous to the bag-of-words representation of text documents. In this paper, we introduce a corpus named visual tag dictionary. Unlike the conventional dictionaries that define terms with textual words, the visual tag dictionary interprets each tag with visual words. The dictionary is constructed in a fully automatic way by exploring the tagged image data on the Internet. With this dictionary, tags and images are connected via visual words and many applications can be thus facilitated. As examples, we empirically demonstrate the effectiveness of the dictionary in tag-based image search, tag ranking and image annotation.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[3]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[4]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[5]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[6]  Gang Wang,et al.  Web 2.0 dictionary , 2008, CIVR '08.

[7]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[8]  Rong Yan,et al.  A learning-based hybrid tagging and browsing approach for efficient manual image annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[10]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[11]  Wei-Ying Ma,et al.  VirtualTour: an online travel assistant based on high quality images , 2006, MM '06.

[12]  John R. Smith,et al.  Role of classifiers in multimedia content management , 2003, IS&T/SPIE Electronic Imaging.

[13]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[15]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[16]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[17]  Kilian Q. Weinberger,et al.  Resolving tag ambiguity , 2008, ACM Multimedia.

[18]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[20]  Francisco Javier Caminero Gil,et al.  Discriminative training of GMM for speaker identification , 1996, ICASSP.

[21]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.