Evaluating sources and strategies for learning video concepts from social media

Learning video concept detectors from social media sources, such as Flickr images and YouTube videos, has the potential to address a wide variety of concept queries for video search. While the potential has been recognized by many, and progress on the topic has been impressive, we argue that two key questions, i.e., What visual tagging source is most suited for selecting positive training examples to learn video concepts? and What strategy should be used for selecting positive examples from tagged sources?, remain open. As an initial attempt to answer the two questions, we conduct an experimental study using a video search engine which is capable of learning concept detectors from social media, be it socially tagged videos or socially tagged images. Within the video search engine we investigate six strategies of positive examples selection. The performance is evaluated on the challenging TRECVID benchmark 2011 with 400 hours of Internet videos. The new experiments lead to novel and nontrivial findings: (1) tagged images are a better source for learning video concepts from the web, (2) selecting tag relevant examples as positives for learning video concepts is always beneficial and it can be done automatically and (3) the best source and strategy compare favorably against several present-day methods.

[1]  Alberto Del Bimbo,et al.  Enriching and localizing semantic tags in internet videos , 2011, ACM Multimedia.

[2]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Baoxin Li,et al.  YouTubeCat: Learning to categorize wild web videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Meng Wang,et al.  ShotTagger: tag location for internet videos , 2011, ICMR.

[5]  Cees Snoek,et al.  Can social tagged images aid concept-based video search? , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[6]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[7]  Jiebo Luo,et al.  Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.

[8]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[9]  Jun Yang,et al.  (Un)Reliability of video concept detection , 2008, CIVR '08.

[10]  Adrian Ulges,et al.  Identifying relevant frames in weakly labeled videos for training concept detectors , 2008, CIVR '08.

[11]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[12]  Mark Craven,et al.  Supervised versus multiple instance learning: an empirical comparison , 2005, ICML.

[13]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[14]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[15]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[16]  Chong-Wah Ngo,et al.  Sampling and Ontologically Pooling Web Images for Visual Concept Learning , 2012, IEEE Transactions on Multimedia.

[17]  Marcel Worring,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Harvesting Social Images for Bi-Concept Search , 2022 .

[18]  Adrian Ulges,et al.  A System That Learns to Tag Videos by Watching Youtube , 2008, ICVS.

[19]  Akira Kojima,et al.  A novel method for semantic video concept learning using web images , 2011, MM '11.

[20]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.