Image Tagging with Social Assistance

Image tagging, also known as image annotation and image conception detection, has been extensively studied in the literature. However, most existing approaches can hardly achieve satisfactory performance owing to the deficiency and unreliability of the manually-labeled training data. In this paper, we propose a new image tagging scheme, termed social assisted media tagging (SAMT), which leverages the abundant user-generated images and the associated tags as the "social assistance" to learn the classifiers. We focus on addressing the following major challenges: (a) the noisy tags associated to the web images; and (b) the desirable robustness of the tagging model. We present a joint image tagging framework which simultaneously refines the erroneous tags of the web images as well as learns the reliable image classifiers. In particular, we devise a novel tag refinement module for identifying and eliminating the noisy tags by substantially exploring and preserving the low-rank nature of the tag matrix and the structured sparse property of the tag errors. We develop a robust image tagging module based on the l2,p-norm for learning the reliable image classifiers. The correlation of the two modules is well explored within the joint framework to reinforce each other. Extensive experiments on two real-world social image databases illustrate the superiority of the proposed approach as compared to the existing methods.

[1]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[2]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[3]  Xuelong Li,et al.  Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014, IEEE Transactions on Cybernetics.

[4]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[5]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[8]  Hao Xu,et al.  Tag refinement by regularized LDA , 2009, ACM Multimedia.

[9]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[10]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[11]  Zi Huang,et al.  Mining multi-tag association for image tagging , 2011, World Wide Web.

[12]  Yi Yang,et al.  Effective transfer tagging from image to video , 2013, TOMCCAP.

[13]  Zi Huang,et al.  Automatic tagging by exploring tag information capability and correlation , 2011, World Wide Web.

[14]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[15]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[16]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[17]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[19]  Meng Wang,et al.  Multimodal Graph-Based Reranking for Web Image Search , 2012, IEEE Transactions on Image Processing.

[20]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[21]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[22]  Zi Huang,et al.  Tag localization with spatial correlations and joint group sparsity , 2011, CVPR 2011.

[23]  Zi Huang,et al.  Local image tagging via graph regularized joint group sparsity , 2013, Pattern Recognit..

[24]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[25]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[26]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[27]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[28]  Meng Wang,et al.  Visual query suggestion , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[29]  Yue Gao,et al.  Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval , 2013, ACM Multimedia.

[30]  Tat-Seng Chua,et al.  Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation , 2012, IEEE Transactions on Image Processing.

[31]  Feiping Nie,et al.  Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization , 2012, AAAI.