Integrating visual and semantic contexts for topic network generation and word sense disambiguation

To support more effective searches in large-scale weakly-tagged image collections, we have developed a novel algorithm to integrate both the visual similarity contexts between the images and the semantic similarity contexts between their tags for topic network generation and word sense disambiguation. First, a topic network is generated to characterize both the semantic similarity contexts and the visual similarity contexts between the image topics more sufficiently. By organizing large numbers of image topics according to their cross-modal inter-topic similarity contexts, our topic network can make the semantics behind the tag space more explicit, so that users can gain deep insights rapidly and formulate their queries more precisely. Second, our word sense disambiguation algorithm can integrate the topic network to exploit both the visual similarity contexts between the images and the semantic similarity contexts between their tags for addressing the issues of polysemes and synonyms more effectively, thus it can significantly improve the precision and recall rates for image retrieval. Our experiments on large-scale Flickr and LabelMe image collections have provided very positive results.

[1]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Makoto Nagao,et al.  General Word Sense Disambiguation Method Based on a Full Sentential Context , 1998, WordNet@ACL/COLING.

[4]  Yong Yu,et al.  Exploring social annotations for the semantic web , 2006, WWW '06.

[5]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[6]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Wei-Ying Ma,et al.  A probabilistic semantic model for image annotation and multi-modal image retrieval , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Adam Mathes,et al.  Folksonomies-Cooperative Classification and Communication Through Shared Metadata , 2004 .

[9]  Bamshad Mobasher,et al.  Personalizing Navigation in Folksonomies Using Hierarchical Tag Clustering , 2008, DaWaK.

[10]  Andreas Paepcke,et al.  Time as essence for photo browsing through personal digital libraries , 2002, JCDL '02.

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[13]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.

[14]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[15]  H. Garcia-Molina,et al.  Automatic organization for digital photographs with geographic coordinates , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[16]  Kobus Barnard,et al.  Word sense disambiguation with pictures , 2003, HLT-NAACL 2003.

[17]  Edwin Simpson,et al.  Clustering Tags in Enterprise and Web Folksonomies , 2021, ICWSM.

[18]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[19]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[21]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[22]  Wei-Ying Ma,et al.  A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieva , 2005, ICCV.

[23]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[24]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[25]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[26]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[27]  Marieke Guy,et al.  Folksonomies: Tidying Up Tags? , 2006, D Lib Mag..

[28]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[29]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[30]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[32]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[33]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Pavel Velikhov,et al.  Harnessing Wikipedia for smart tags clustering , 2008 .