Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Due to the popularity of social media websites, extensive research efforts have been dedicated to tag-based social image search. Both visual information and tags have been investigated in the research field. However, most existing methods use tags and visual characteristics either separately or sequentially in order to estimate the relevance of images. In this paper, we propose an approach that simultaneously utilizes both visual and textual information to estimate the relevance of user tagged images. The relevance estimation is determined with a hypergraph learning approach. In this method, a social image hypergraph is constructed, where vertices represent images and hyperedges represent visual or textual terms. Learning is achieved with use of a set of pseudo-positive images, where the weights of hyperedges are updated throughout the learning process. In this way, the impact of different tags and visual words can be automatically modulated. Comparative results of the experiments conducted on a dataset including 370+images are presented, which demonstrate the effectiveness of the proposed approach.

[1]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[2]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[3]  Rong Yan,et al.  Multimedia Search with Pseudo-relevance Feedback , 2003, CIVR.

[4]  Qi Tian,et al.  Task-Dependent Visual-Codebook Compression , 2012, IEEE Transactions on Image Processing.

[5]  Edwin R. Hancock,et al.  Learning Large Scale Class Specific Hyper Graphs for Object Recognition , 2009, 2009 Fifth International Conference on Image and Graphics.

[6]  Yue Gao,et al.  Tag-based social image search with visual-text joint hypergraph learning , 2011, ACM Multimedia.

[7]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Dong Liu,et al.  Image Retagging Using Collaborative Tag Propagation , 2011, IEEE Transactions on Multimedia.

[9]  Ivor W. Tsang,et al.  Tag-based web photo retrieval improved by batch mode re-tagging , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[11]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Dong Liu,et al.  Semi-Automatic Tagging of Photo Albums via Exemplar Selection and Tag Inference , 2011, IEEE Transactions on Multimedia.

[13]  Amnon Shashua,et al.  Probabilistic graph and hypergraph matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Chun Chen,et al.  Music recommendation by unified hypergraph: combining social media information and music content , 2010, ACM Multimedia.

[15]  Hao Xu,et al.  Tag refinement by regularized LDA , 2009, ACM Multimedia.

[16]  Qingshan Liu,et al.  Hypergraph with sampling for image retrieval , 2011, Pattern Recognit..

[17]  Yue Gao,et al.  Camera Constraint-Free View-Based 3-D Object Retrieval , 2012, IEEE Transactions on Image Processing.

[18]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[19]  Chong-Wah Ngo,et al.  Bag-of-visual-words expansion using visual relatedness for video indexing , 2008, SIGIR '08.

[20]  Shih-Fu Chang,et al.  Video search reranking through random walk over document-level context graph , 2007, ACM Multimedia.

[21]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[22]  Qingshan Liu,et al.  Image retrieval via probabilistic hypergraph ranking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[24]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[25]  Xuelong Li,et al.  Modality Mixture Projections for Semantic Video Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Qionghai Dai,et al.  Multilabel Neighborhood Propagation for Region-Based Image Retrieval , 2008, IEEE Transactions on Multimedia.

[27]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2012, International Journal of Computer Vision.

[28]  Meng Wang,et al.  Dynamic captioning: video accessibility enhancement for hearing impairment , 2010, ACM Multimedia.

[29]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[30]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[31]  Yue Gao,et al.  3-D Object Retrieval and Recognition With Hypergraph Analysis , 2012, IEEE Transactions on Image Processing.

[32]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[33]  Shih-Fu Chang,et al.  Label diagnosis through self tuning for web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[35]  Wen Gao,et al.  Towards semantic embedding in visual vocabulary , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Yue Gao,et al.  W2Go: a travel guidance system by automatic landmark ranking , 2010, ACM Multimedia.

[37]  Edwin R. Hancock,et al.  Clustering Using Class Specific Hyper Graphs , 2008, SSPR/SPR.

[38]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[39]  Meng Wang,et al.  Visual query suggestion , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[40]  Marc Rioux,et al.  Recognition and Shape Synthesis of 3-D Objects Based on Attributed Hypergraphs , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Qi Tian,et al.  Less is More: Efficient 3-D Object Retrieval With Query View Selection , 2011, IEEE Transactions on Multimedia.

[42]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[43]  Jianping Fan,et al.  Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Dong Liu,et al.  Tag quality improvement for social images , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[45]  Edwin R. Hancock,et al.  3D Object Recognition Using Hyper-Graphs and Ranked Local Invariant Features , 2008, SSPR/SPR.

[46]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[48]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[49]  Dong Liu,et al.  Boost search relevance for tag-based social image retrieval , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[50]  Jialie Shen,et al.  Personalized video similarity measure , 2011, Multimedia Systems.

[51]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[52]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[53]  Dimitris N. Metaxas,et al.  ]Video object segmentation by hypergraph cut , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Sourav S. Bhowmick,et al.  Image tag clarity: in search of visual-representative tags for social images , 2009, WSM@MM.

[55]  Katsumi Tanaka,et al.  Can Social Tagging Improve Web Image Search? , 2008, WISE.

[56]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[57]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[58]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[59]  Meng Wang,et al.  Semi-supervised kernel density estimation for video annotation , 2009, Comput. Vis. Image Underst..

[60]  Xuelong Li,et al.  QUC-Tree: Integrating Query Context Information for Efficient Music Retrieval , 2009, IEEE Transactions on Multimedia.

[61]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[62]  Meng Wang,et al.  Social Image Search with Diverse Relevance Ranking , 2010, MMM.

[63]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.