Assistive tagging: A survey of multimedia tagging with human-computer joint exploration

Along with the explosive growth of multimedia data, automatic multimedia tagging has attracted great interest of various research communities, such as computer vision, multimedia, and information retrieval. However, despite the great progress achieved in the past two decades, automatic tagging technologies still can hardly achieve satisfactory performance on real-world multimedia data that vary widely in genre, quality, and content. Meanwhile, the power of human intelligence has been fully demonstrated in the Web 2.0 era. If well motivated, Internet users are able to tag a large amount of multimedia data. Therefore, a set of new techniques has been developed by combining humans and computers for more accurate and efficient multimedia tagging, such as batch tagging, active tagging, tag recommendation, and tag refinement. These techniques are able to accomplish multimedia tagging by jointly exploring humans and computers in different ways. This article refers to them collectively as assistive tagging and conducts a comprehensive survey of existing research efforts on this theme. We first introduce the status of automatic tagging and manual tagging and then state why assistive tagging can be a good solution. We categorize existing assistive tagging techniques into three paradigms: (1) tagging with data selection & organization; (2) tag recommendation; and (3) tag processing. We introduce the research efforts on each paradigm and summarize the methodologies. We also provide a discussion on several future trends in this research direction.

[1]  Meng Wang,et al.  ShotTagger: tag location for internet videos , 2011, ICMR.

[2]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  G. Cox,et al.  ~ " " " ' l I ~ " " -" . : -· " J , 2006 .

[4]  Qi Tian,et al.  Multi-label boosting for image annotation by structural grouping sparsity , 2010, ACM Multimedia.

[5]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[6]  Alexander Hauptmann,et al.  How many high-level concepts will fill the semantic gap in video retrieval ? , 2007 .

[7]  Rong Yan,et al.  Hybrid Tagging and Browsing Approaches for Efficient Manual Image Annotation , 2009, IEEE MultiMedia.

[8]  Dong Liu,et al.  Semi-Automatic Tagging of Photo Albums via Exemplar Selection and Tag Inference , 2011, IEEE Transactions on Multimedia.

[9]  Wesley De Neve,et al.  MAP-based image tag recommendation using a visual folksonomy , 2010, Pattern Recognit. Lett..

[10]  Ja-Ling Wu,et al.  SheepDog: group and tag recommendation for flickr photos by automatic search-based learning , 2008, ACM Multimedia.

[11]  Tim O'Reilly,et al.  What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software , 2007 .

[12]  Cees Snoek,et al.  Adding semantics to image-region annotations with the Name-It-Game , 2011, Multimedia Systems.

[13]  Mor Naaman,et al.  ZoneTag's Collaborative Tag Suggestions: What is This Person Doing in My Phone? , 2008, IEEE MultiMedia.

[14]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[15]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[16]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[17]  Yueting Zhuang,et al.  Heterogeneous feature selection by group lasso with logistic regression , 2010, ACM Multimedia.

[18]  James Ze Wang,et al.  Real-time computerized annotation of pictures. , 2008, IEEE transactions on pattern analysis and machine intelligence.

[19]  Paul Over,et al.  TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[20]  Koen E. A. van de Sande,et al.  Empowering Visual Categorization With the GPU , 2011, IEEE Transactions on Multimedia.

[21]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[22]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[23]  Jianping Fan,et al.  Leveraging loosely-tagged images and inter-object correlations for tag recommendation , 2010, ACM Multimedia.

[24]  Hai Jin,et al.  Nonparametric Label-to-Region by search , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Luciano Sbaiz,et al.  Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[27]  Jianping Fan,et al.  Incorporating concept ontology to enable probabilistic concept reasoning for multi-level image annotation , 2006, MIR '06.

[28]  Shuicheng Yan,et al.  Learning to rank tags , 2010, CIVR '10.

[29]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  John R. Smith,et al.  VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .

[31]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[33]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Ophir Frieder,et al.  Surrogate scoring for improved metasearch precision , 2005, SIGIR '05.

[36]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[37]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[38]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[39]  Yi Li,et al.  ARISTA - image search to annotation on billions of web photos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Adrian Ulges,et al.  Identifying relevant frames in weakly labeled videos for training concept detectors , 2008, CIVR '08.

[41]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[42]  Jiebo Luo,et al.  Kodak consumer video benchmark data set : concept definition and annotation * * , 2008 .

[43]  Yuandong Tian,et al.  A Face Annotation Framework with Partial Clustering and Interactive Labeling , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[45]  Arnold W. M. Smeulders,et al.  Visual-Concept Search Solved? , 2010, Computer.

[46]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[47]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[48]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[49]  Rong Yan,et al.  Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce , 2009, LS-MMRM '09.

[50]  Jianping Fan,et al.  Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Kilian Q. Weinberger,et al.  Resolving tag ambiguity , 2008, ACM Multimedia.

[52]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[53]  Steffen Staab,et al.  Large Scale Tag Recommendation Using Different Image Representations , 2009, SAMT.

[54]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[55]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[56]  Alberto Del Bimbo,et al.  Tag suggestion and localization in user-generated videos based on social knowledge , 2010, WSM@MM.

[57]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.

[58]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[60]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[61]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[63]  Adam Rae,et al.  Improving tag recommendation using social networks , 2010, RIAO.

[64]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[65]  Hai Jin,et al.  Label to region by bi-layer sparsity priors , 2009, MM '09.

[66]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[67]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[68]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[69]  Meng Wang,et al.  Active tagging for image indexing , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[70]  Yongdong Zhang,et al.  Context-oriented web video tag recommendation , 2010, WWW '10.

[71]  Ramesh C. Jain,et al.  One person labels one million images , 2010, ACM Multimedia.

[72]  Sriram Subramanian,et al.  Talking about tactile experiences , 2013, CHI.

[73]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[74]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[75]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[76]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[77]  Ebroul Izquierdo,et al.  An interactive framework for image annotation through gaming , 2010, MIR '10.

[78]  Nenghai Yu,et al.  Learning to tag , 2009, WWW '09.

[79]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[80]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[81]  Jane Yung-jen Hsu,et al.  KissKissBan: a competitive human computation game for image annotation , 2010, HCOMP '09.

[82]  Dong Liu,et al.  Unified tag analysis with multi-edge graph , 2010, ACM Multimedia.

[83]  Edward Y. Chang,et al.  Active Learning for Interactive Multimedia Retrieval , 2008, Proceedings of the IEEE.

[84]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  Ivor W. Tsang,et al.  Tag-based web photo retrieval improved by batch mode re-tagging , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[86]  Hao Xu,et al.  Tag refinement by regularized LDA , 2009, ACM Multimedia.

[87]  Rong Yan,et al.  How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[88]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[89]  Benjamin B. Bederson,et al.  Semi-Automatic Image Annotation Using Event and Torso Identification , 2004 .

[90]  Adam Vogel,et al.  TagEz: Flickr Tag Recommendation , 2008 .

[91]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[92]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[93]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[94]  De Xu,et al.  Beyond tag relevance: integrating visual attention model and multi-instance learning for tag saliency ranking , 2010, CIVR '10.

[95]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[96]  Ingmar Weber,et al.  Personalized, interactive tag recommendation for flickr , 2008, RecSys '08.

[97]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[98]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[99]  Nuno Correia,et al.  Playing games as a way to improve automatic image annotation , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[100]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[101]  Mark Craven,et al.  Active Learning with Real Annotation Costs , 2008 .

[102]  Dong Liu,et al.  Content-based tag processing for Internet social images , 2010, Multimedia Tools and Applications.

[103]  Pinar Duygulu Sahin,et al.  Automatic tag expansion using visual similarity for photo sharing websites , 2010, Multimedia Tools and Applications.

[104]  Sourav S. Bhowmick,et al.  Quantifying tag representativeness of visual content of social images , 2010, ACM Multimedia.

[105]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[106]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[107]  Meng Wang,et al.  Tagging tags , 2010, ACM Multimedia.

[108]  Yuandong Tian,et al.  EasyAlbum: an interactive photo annotation system based on face clustering and re-ranking , 2007, CHI.

[109]  Joshua R. Smith,et al.  A Web-based System for Collaborative Annotation of Large Image and Video Collections , 2005 .