Bridging the Semantic Gap Between Image Contents and Tags

With the exponential growth of Web 2.0 applications, tags have been used extensively to describe the image contents on the Web. Due to the noisy and sparse nature in the human generated tags, how to understand and utilize these tags for image retrieval tasks has become an emerging research direction. As the low-level visual features can provide fruitful information, they are employed to improve the image retrieval results. However, it is challenging to bridge the semantic gap between image contents and tags. To attack this critical problem, we propose a unified framework in this paper which stems from a two-level data fusions between the image contents and tags: 1) A unified graph is built to fuse the visual feature-based image similarity graph with the image-tag bipartite graph; 2) A novel random walk model is then proposed, which utilizes a fusion parameter to balance the influences between the image contents and tags. Furthermore, the presented framework not only can naturally incorporate the pseudo relevance feedback process, but also it can be directly applied to applications such as content-based image retrieval, text-based image retrieval, and image annotation. Experimental analysis on a large Flickr dataset shows the effectiveness and efficiency of our proposed framework.

[1]  Jianping Fan,et al.  Automatic image annotation by using concept-sensitive salient objects for image content representation , 2004, SIGIR '04.

[2]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[3]  Michael R. Lyu,et al.  Effective missing data prediction for collaborative filtering , 2007, SIGIR.

[4]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[5]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[6]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[7]  Shih-Fu Chang,et al.  Video search reranking through random walk over document-level context graph , 2007, ACM Multimedia.

[8]  Rayleigh The Problem of the Random Walk , 1905, Nature.

[9]  Xian-Sheng Hua,et al.  Learning semantic distance from community-tagged media collection , 2009, MM '09.

[10]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[11]  Shuicheng Yan,et al.  Near-duplicate keyframe retrieval by nonrigid image matching , 2008, ACM Multimedia.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Jianping Fan,et al.  Mining Multilevel Image Semantics via Hierarchical Classification , 2008, IEEE Transactions on Multimedia.

[14]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[15]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[16]  Winston H. Hsu,et al.  Query expansion for hash-based image object retrieval , 2009, ACM Multimedia.

[17]  Swarup Medasani,et al.  Content-based image retrieval based on a fuzzy approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[18]  Chabane Djeraba,et al.  Association and Content-Based Retrieval , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[20]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[21]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[22]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[23]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[24]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[25]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Francesco G. B. De Natale,et al.  Content-Based Image Retrieval by Feature Adaptation and Relevance Feedback , 2007, IEEE Transactions on Multimedia.

[27]  Michael R. Lyu,et al.  Face Annotation Using Transductive Kernel Fisher Discriminant , 2008, IEEE Transactions on Multimedia.

[28]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[29]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[30]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[31]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[32]  Sanjeev Khudanpur,et al.  Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.

[33]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  O. Chum,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[36]  Michael R. Lyu,et al.  Learning to recommend with social trust ensemble , 2009, SIGIR.

[37]  Shuicheng Yan,et al.  Near-duplicate keyframe retrieval by semi-supervised learning and nonrigid image matching , 2011, TOMCCAP.

[38]  Michael R. Lyu,et al.  DiffusionRank: a possible penicillin for web spamming , 2007, SIGIR.

[39]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[40]  Tat-Seng Chua,et al.  Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations , 2010, IEEE Transactions on Multimedia.

[41]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[42]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[43]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.