Social tag relevance learning via ranking-oriented neighbor voting

High quality tags play a critical role in applications involving online multimedia search, such as social image annotation, sharing and browsing. However, user-generated tags in real world are often imprecise and incomplete to describe the image contents, which severely degrades the performance of current search systems. To improve the descriptive powers of social tags, a fundamental issue is tag relevance learning, which concerns how to interpret the relevance of a tag with respect to the contents of an image effectively. In this paper, we investigate the problem from a new perspective of learning to rank, and develop a novel approach to facilitate tag relevance learning to directly optimize the ranking performance of tag-based image search. Specifically, a supervision step is introduced into the neighbor voting scheme, in which the tag relevance is estimated by accumulating votes from visual neighbors. Through explicitly modeling the neighbor weights and tag correlations, the risk of making heuristic assumptions is effectively avoided. Besides, our approach does not suffer from the scalability problem since a generic model is learned that can be applied to all tags. Extensive experiments on two benchmark datasets in comparison with the state-of-the-art methods demonstrate the promise of our approach.

[1]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[2]  Zi Huang,et al.  Tag localization with spatial correlations and joint group sparsity , 2011, CVPR 2011.

[3]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[4]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[5]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[6]  Shuicheng Yan,et al.  Learning to rank tags , 2010, CIVR '10.

[7]  Xirong Li,et al.  Visual categorization with negative examples for free , 2009, ACM Multimedia.

[8]  Alberto Del Bimbo,et al.  A Cross-media Model for Automatic Image Annotation , 2014, ICMR.

[9]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[10]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[11]  Jianmin Wang,et al.  Image Tag Completion via Image-Specific and Tag-Specific Linear Sparse Reconstructions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Meng Wang,et al.  Multimedia tagging: past, present and future , 2011, ACM Multimedia.

[13]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[14]  Chaoran Cui,et al.  Improving image annotation via ranking‐oriented neighbor search and learning‐based keyword propagation , 2014, J. Assoc. Inf. Sci. Technol..

[15]  Ivor W. Tsang,et al.  Tag-Based Image Retrieval Improved by Augmented Features and Group-Based Refinement , 2012, IEEE Transactions on Multimedia.

[16]  Robinson Piramuthu,et al.  ConceptLearner: Discovering visual concepts from weakly labeled image collections , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[18]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Tao Mei,et al.  Image tag refinement by regularized latent Dirichlet allocation , 2013, Comput. Vis. Image Underst..

[20]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[21]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[22]  Alberto Del Bimbo,et al.  An evaluation of nearest-neighbor methods for tag refinement , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[23]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[24]  Wesley De Neve,et al.  Visually weighted neighbor voting for image tag relevance learning , 2014, Multimedia Tools and Applications.

[25]  Changsheng Xu,et al.  User-Aware Image Tag Refinement via Ternary Semantic Analysis , 2012, IEEE Transactions on Multimedia.

[26]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[27]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[28]  Wei-Ying Ma,et al.  Image annotation by large-scale content-based image retrieval , 2006, MM '06.

[29]  Alberto Del Bimbo,et al.  Data-driven approaches for social image and video tagging , 2015, Multimedia Tools and Applications.

[30]  William I. Grosky,et al.  Narrowing the semantic gap - improved text-based web document retrieval using visual features , 2002, IEEE Trans. Multim..

[31]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[32]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[33]  冯松鹤 Learning to Rank Image Tags with Limited Training Examples , 2016 .

[34]  Xirong Li,et al.  Tag relevance fusion for social image retrieval , 2014, Multimedia Systems.

[35]  Jialie Shen,et al.  The effects of multiple query evidences on social image retrieval , 2014, Multimedia Systems.

[36]  Sourav S. Bhowmick,et al.  Content is still king: the effect of neighbor voting schemes on tag relevance for social image retrieval , 2012, ICMR.

[37]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[38]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[39]  Cordelia Schmid,et al.  Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.

[40]  Jianping Fan,et al.  Leveraging loosely-tagged images and inter-object correlations for tag recommendation , 2010, ACM Multimedia.

[41]  Wolfgang Nejdl,et al.  An adaptive teleportation random walk model for learning social tag relevance , 2014, SIGIR.

[42]  Marcel Worring,et al.  Bootstrapping Visual Categorization With Relevant Negatives , 2013, IEEE Transactions on Multimedia.

[43]  Chiranjib Bhattacharyya,et al.  Structured learning for non-smooth ranking losses , 2008, KDD.

[44]  Nenghai Yu,et al.  Learning to tag , 2009, WWW '09.

[45]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[46]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[47]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[48]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[49]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[50]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.