Socializing the Semantic Gap

Where previous reviews on content-based image retrieval emphasize what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems (i.e., image tag assignment, refinement, and tag-based image retrieval) is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, that is, estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this article introduces a two-dimensional taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison with the state of the art, a new experimental protocol is presented, with training sets containing 10,000, 100,000, and 1 million images, and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.

[1]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[2]  Karl Stratos,et al.  Detecting Visual Text , 2012, NAACL.

[3]  Dong Liu,et al.  Content-based tag processing for Internet social images , 2010, Multimedia Tools and Applications.

[4]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[5]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[7]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[8]  Ivor W. Tsang,et al.  Tag-Based Image Retrieval Improved by Augmented Features and Group-Based Refinement , 2012, IEEE Transactions on Multimedia.

[9]  Chong-Wah Ngo,et al.  Semantic context transfer across heterogeneous sources for domain adaptive video search , 2009, ACM Multimedia.

[10]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[12]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[13]  Jianmin Wang,et al.  Image Tag Completion via Image-Specific and Tag-Specific Linear Sparse Reconstructions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[15]  David C. Wilkins,et al.  Readings in Knowledge Acquisition and Learning: Automating the Construction and Improvement of Expert Systems , 1992 .

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Rui Li,et al.  Survey on social tagging techniques , 2010, SKDD.

[18]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[19]  Sourav S. Bhowmick,et al.  Content is still king: the effect of neighbor voting schemes on tag relevance for social image retrieval , 2012, ICMR.

[20]  Vladimir Pavlovic,et al.  Baselines for Image Annotation , 2010, International Journal of Computer Vision.

[21]  Marcel Worring,et al.  Annotating images by harnessing worldwide user-tagged photos , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Congyan Lang,et al.  Towards relevance and saliency ranking of image tags , 2012, ACM Multimedia.

[23]  Meng Wang,et al.  Tag Tagging: Towards More Descriptive Keywords of Image Content , 2011, IEEE Transactions on Multimedia.

[24]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[25]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[26]  Robinson Piramuthu,et al.  ConceptLearner: Discovering visual concepts from weakly labeled image collections , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yuan Yan Tang,et al.  Social Image Tagging With Diverse Semantics , 2014, IEEE Transactions on Cybernetics.

[28]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[29]  Jing Liu,et al.  Personalized Geo-Specific Tag Recommendation for Photos on Social Websites , 2014, IEEE Transactions on Multimedia.

[30]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[31]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[32]  Michael R. Lyu,et al.  Bridging the Semantic Gap Between Image Contents and Tags , 2010, IEEE Transactions on Multimedia.

[33]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Oded Nov,et al.  Why do people tag? , 2010, Commun. ACM.

[35]  Koen E. A. van de Sande,et al.  All vehicles are cars: subclass preferences in container concepts , 2012, ICMR '12.

[36]  Yue Gao,et al.  Image Tagging with Social Assistance , 2014, ICMR.

[37]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[38]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[39]  Jure Leskovec,et al.  Image Labeling on a Network: Using Social-Network Metadata for Image Classification , 2012, ECCV.

[40]  Wesley De Neve,et al.  Visually weighted neighbor voting for image tag relevance learning , 2014, Multimedia Tools and Applications.

[41]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[42]  Joemon M. Jose,et al.  On contextual photo tag recommendation , 2013, SIGIR.

[43]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[44]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[45]  Alberto Del Bimbo,et al.  Data-driven approaches for social image and video tagging , 2015, Multimedia Tools and Applications.

[46]  Alberto Del Bimbo,et al.  An evaluation of nearest-neighbor methods for tag refinement , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[47]  Changsheng Xu,et al.  Learn to Personalized Image Search From the Photo Sharing Websites , 2012, IEEE Transactions on Multimedia.

[48]  Greg Mori,et al.  A Max-Margin Riffled Independence Model for Image Tag Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[50]  Eric P. Xing,et al.  Time-sensitive web image ranking and retrieval via dynamic multi-task regression , 2013, WSDM '13.

[51]  Céline Hudelot,et al.  Tag completion based on belief theory and neighbor voting , 2013, ICMR.

[52]  Alberto Del Bimbo,et al.  A Cross-media Model for Automatic Image Annotation , 2014, ICMR.

[53]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[54]  James Ze Wang,et al.  Quest for relevant tags using local interaction networks and visual content , 2010, MIR '10.

[55]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[56]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[57]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[58]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[59]  Ping Zhong,et al.  Tag Refinement for User-Contributed Images via Graph Learning and Nonnegative Tensor Factorization , 2015, IEEE Signal Processing Letters.

[60]  Sourav S. Bhowmick,et al.  Tag-based social image retrieval: An empirical evaluation , 2011, J. Assoc. Inf. Sci. Technol..

[61]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[62]  Rong Jin,et al.  Image Tag Completion by Noisy Matrix Recovery , 2014, ECCV.

[63]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[64]  Gang Wang,et al.  Learning image similarity from Flickr groups using Stochastic Intersection Kernel MAchines , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[65]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[66]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[67]  Cordelia Schmid,et al.  Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.

[68]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[69]  Xing Xu,et al.  Tag completion with defective tag assignments via image-tag re-weighting , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[70]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[71]  Steven C. H. Hoi,et al.  A two-view learning approach for image tag ranking , 2011, WSDM '11.

[72]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Jing Liu,et al.  Nonlinear matrix factorization with unified embedding for social tag relevance learning , 2013, Neurocomputing.

[74]  Hao Xu,et al.  Tag refinement by regularized LDA , 2009, ACM Multimedia.

[75]  Hsuan-Tien Lin,et al.  Unsupervised Semantic Feature Discovery for Image Object Retrieval and Tag Refinement , 2012, IEEE Transactions on Multimedia.

[76]  Azhar Rauf,et al.  Semantics discovery in social tagging systems: A review , 2014, Multimedia Tools and Applications.

[77]  Haroon Idrees,et al.  NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Kun Duan,et al.  Multimodal Learning in Loosely-Organized Web Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[80]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[81]  Chong-Wah Ngo,et al.  Sampling and Ontologically Pooling Web Images for Visual Concept Learning , 2012, IEEE Transactions on Multimedia.

[82]  Wolfgang Nejdl,et al.  An adaptive teleportation random walk model for learning social tag relevance , 2014, SIGIR.

[83]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[84]  Xirong Li,et al.  Tag relevance fusion for social image retrieval , 2014, Multimedia Systems.

[85]  Jing Liu,et al.  Image annotation using multi-correlation probabilistic matrix factorization , 2010, ACM Multimedia.

[86]  Ivor W. Tsang,et al.  Improving Web Image Search by Bag-Based Reranking , 2011, IEEE Transactions on Image Processing.

[87]  Marcel Worring,et al.  Bootstrapping Visual Categorization With Relevant Negatives , 2013, IEEE Transactions on Multimedia.

[88]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[89]  Nenghai Yu,et al.  Learning to tag , 2009, WWW '09.

[90]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  Xirong Li,et al.  Classifying tag relevance with relevant positive and negative examples , 2013, ACM Multimedia.

[92]  Joemon M. Jose,et al.  Improving Automatic Image Tagging Using Temporal Tag Co-occurrence , 2013, MMM.

[93]  Yifan Zhang,et al.  Correlation consistency constrained probabilistic matrix factorization for social tag refinement , 2013, Neurocomputing.

[94]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[95]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[96]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[98]  Gang Hua,et al.  Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[99]  Mingjing Li Texture Moment for Content-Based Image Retrieval , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[100]  Gang Wang,et al.  Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[101]  Marcel Worring,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Harvesting Social Images for Bi-Concept Search , 2022 .

[102]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[103]  Ivor W. Tsang,et al.  Textual Query of Personal Photos Facilitated by Large-Scale Web Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  John Riedl,et al.  tagging, communities, vocabulary, evolution , 2006, CSCW '06.

[105]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[106]  Ivor W. Tsang,et al.  Text-based image retrieval using progressive multi-instance learning , 2011, 2011 International Conference on Computer Vision.

[107]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[108]  Marcel Worring,et al.  Personalizing automated image annotation using cross-entropy , 2011, ACM Multimedia.

[109]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[110]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[111]  Lamberto Ballan,et al.  Love Thy Neighbors: Image Annotation by Exploiting Image Metadata , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[112]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[113]  James Ze Wang,et al.  Automatic image semantic interpretation using social action and tagging data , 2010, Multimedia Tools and Applications.

[114]  Qi Tian,et al.  Multimedia search reranking: A literature survey , 2014, CSUR.

[115]  Kilian Q. Weinberger,et al.  Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases , 2009, WSMC '09.

[116]  Jing Wang,et al.  Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.

[117]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[118]  Changsheng Xu,et al.  User-Aware Image Tag Refinement via Ternary Semantic Analysis , 2012, IEEE Transactions on Multimedia.

[119]  Dong Liu,et al.  Image Retagging Using Collaborative Tag Propagation , 2011, IEEE Transactions on Multimedia.

[120]  Heng Ji,et al.  Exploring Context and Content Links in Social Media: A Latent Space Method , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[121]  Bogdan Ionescu,et al.  Toward an Estimation of User Tagging Credibility for Social Image Retrieval , 2014, ACM Multimedia.

[122]  Ying He,et al.  Mining social images with distance metric learning for automated image tagging , 2011, WSDM '11.

[123]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[124]  Yueting Zhuang,et al.  Tag Clustering and Refinement on Semantic Unity Graph , 2011, 2011 IEEE 11th International Conference on Data Mining.

[125]  Tao Mei,et al.  Image tag refinement by regularized latent Dirichlet allocation , 2013, Comput. Vis. Image Underst..

[126]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[127]  Rainer Lienhart,et al.  Leveraging community metadata for multimodal image ranking , 2010, Multimedia Tools and Applications.