Data-driven approaches for social image and video tagging

The large success of online social platforms for creation, sharing and tagging of user-generated media has lead to a strong interest by the multimedia and computer vision communities in research on methods and techniques for annotating and searching social media. Visual content similarity, geo-tags and tag co-occurrence, together with social connections and comments, can be exploited to perform tag suggestion as well as to per-form content classification and c lustering and enable more effective semantic indexing and retrieval of visual data. However there is need to overcome the relatively low quality of these metadata: user produced tags and annotations are known to be ambiguous, imprecise and/or incomplete, excessively personalized and limited - and at the same time take into account the ‘web-scale’ quantity of media and the fact that social network users continuously add new images and create new terms. We will review the state of the art approaches to automatic annotation and tag refinement for social images, considering also the temporal patterns of their usage, and discuss extensions to tag suggestion and localization in web video sequences.

[1]  Cordelia Schmid,et al.  Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.

[2]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Alberto Del Bimbo,et al.  Tag suggestion and localization in user-generated videos based on social knowledge , 2010, WSM@MM.

[4]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[5]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[6]  Dong Liu,et al.  Content-based tag processing for Internet social images , 2010, Multimedia Tools and Applications.

[7]  Fei-Fei Li,et al.  Web image prediction using multivariate point processes , 2012, KDD.

[8]  Zhi-Hua Zhou,et al.  Improve Multi-Instance Neural Networks through Feature Selection , 2004, Neural Processing Letters.

[9]  Eric P. Xing,et al.  Modeling and Analysis of Dynamic Behaviors of Web Image Collections , 2010, ECCV.

[10]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[11]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[12]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[13]  Wesley De Neve,et al.  Semantic annotation of personal video content using an image folksonomy , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[14]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[15]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[16]  Changsheng Xu,et al.  User-Aware Image Tag Refinement via Ternary Semantic Analysis , 2012, IEEE Transactions on Multimedia.

[17]  Dong Liu,et al.  Image Retagging Using Collaborative Tag Propagation , 2011, IEEE Transactions on Multimedia.

[18]  Jing Liu,et al.  Image annotation using multi-correlation probabilistic matrix factorization , 2010, ACM Multimedia.

[19]  Kilian Q. Weinberger,et al.  Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases , 2009, WSMC '09.

[20]  Wei-Ta Chu,et al.  Tag suggestion and localization for web videos by bipartite graph matching , 2011, WSM '11.

[21]  Yueting Zhuang,et al.  Topic discovery of web video using star-structured K-partite graph , 2010, ACM Multimedia.

[22]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[23]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[24]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[26]  Alberto Del Bimbo,et al.  Enriching and localizing semantic tags in internet videos , 2011, ACM Multimedia.

[27]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[28]  Haojie Li,et al.  DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video , 2013, MMM.

[29]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[30]  Yi Liu,et al.  Large-scale image annotation using visual synset , 2011, 2011 International Conference on Computer Vision.

[31]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[32]  Markus Koch,et al.  Learning automatic concept detectors from online video , 2010, Comput. Vis. Image Underst..

[33]  Michael Gertz,et al.  On the value of temporal information in information retrieval , 2007, SIGF.

[34]  Eric P. Xing,et al.  Time-sensitive web image ranking and retrieval via dynamic multi-task regression , 2013, WSDM '13.

[35]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[36]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[37]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[38]  Bart Thomee,et al.  TOP-SURF: a visual words toolkit , 2010, ACM Multimedia.

[39]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[40]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[41]  Tao Mei,et al.  Scalable clip-based near-duplicate video detection with ordinal measure , 2010, CIVR '10.

[42]  Jiebo Luo,et al.  The wisdom of social multimedia: using flickr for prediction and forecast , 2010, ACM Multimedia.

[43]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[44]  Meng Wang,et al.  ShotTagger: tag location for internet videos , 2011, ICMR.