Image Annotation Incorporating Low-Rankness, Tag and Visual Correlation and Inhomogeneous Errors

Tag-based image retrieval (TBIR) has drawn much attention in recent years due to the explosive amount of digital images and crowdsourcing tags. However, TBIR is still suffering from the incomplete and inaccurate tags provided by users, posing a great challenge for tag-based image management applications. In this work, we propose a novel method for image annotation, incorporating several priors: Low-Rankness, Tag and Visual Correlation and Inhomogeneous Errors. Highly representative CNN feature vectors are adopted to model the tag-visual correlation and narrow the semantic gap. And we extract word vectors for tags to measure similarity between tags in the semantic level, which is more accurate than traditional frequency-based or graph-based methods. We utilize the Accelerated Proximal Gradient (APG) method to solve our model efficiently. Extensive experiments conducted on multiple benchmark datasets demonstrate the effectiveness and robustness of the proposed method.

[1]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[2]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[3]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[4]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[5]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Wesley De Neve,et al.  MAP-based image tag recommendation using a visual folksonomy , 2010, Pattern Recognit. Lett..

[7]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[8]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[9]  Miao Fan,et al.  Transition-based Knowledge Graph Embedding with Relational Mapping Properties , 2014, PACLIC.

[10]  Gang Hua,et al.  Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[12]  William I. Grosky,et al.  Narrowing the semantic gap - improved text-based web document retrieval using visual features , 2002, IEEE Trans. Multim..

[13]  Rong Jin,et al.  Image Tag Completion by Noisy Matrix Recovery , 2014, ECCV.

[14]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[15]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  S. Yun,et al.  An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems , 2009 .

[18]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[19]  Rong Jin,et al.  Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[21]  Miao Fan,et al.  Large Margin Nearest Neighbor Embedding for Knowledge Representation , 2015, 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT).

[22]  Miao Fan,et al.  Probabilistic Belief Embedding for Knowledge Base Completion , 2015, ArXiv.

[23]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[24]  Nicolas Tsapatsoulis,et al.  Automatic annotation of image databases based on implicit crowdsourcing, visual concept modeling and evolution , 2012, Multimedia Tools and Applications.

[25]  Alexander Gammerman,et al.  Learning by Transduction , 1998, UAI.

[26]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[27]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[28]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[29]  A. Robert Calderbank,et al.  Content-Aware Distortion-Fair Video Streaming in Congested Networks , 2009, IEEE Transactions on Multimedia.

[30]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[31]  Miao Fan,et al.  Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories , 2015, 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[32]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[34]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[35]  Robert D. Nowak,et al.  Transduction with Matrix Completion: Three Birds with One Stone , 2010, NIPS.

[36]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[37]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[38]  Distant Supervision for Relation Extraction with Matrix Completion , 2014, ACL.

[39]  Yifan He,et al.  Jointly Embedding Relations and Mentions for Knowledge Population , 2015, RANLP.

[40]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[41]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.