Crowdsourcing Event Detection in YouTube Videos

Considerable efforts have been put into making video content on the Web more accessible, searchable, and navigable by research on both textual and visual analysis of the actual video content and the accompanying metadata. Nev- ertheless, most of the time, videos are opaque objects in websites. With Web browsers gaining more support for the HTML5 element, videos are becoming first class citizens on the Web. In this paper we show how events can be detected on-the-fly through crowdsourcing (i) textual, (ii) visual, and (iii) be- havioral analysis in YouTube videos, at scale. The main contribution of this paper is a generic crowdsourcing framework for automatic and scalable semantic anno- tations of HTML5 videos. Eventually, we discuss our preliminary results using traditional server-based approaches to video event detection as a baseline.

[1]  Mohammad Soleymani,et al.  Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus , 2010 .

[2]  John G. Breslin,et al.  Enrichment and Ranking of the YouTube Tag Space and Integration with the Linked Data Cloud , 2009, SEMWEB.

[3]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[4]  Rainer Lienhart,et al.  The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away? , 2008 .

[5]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[6]  Erik Bræck Leer Detecting Events in Videos Using Semantic Analytics of Subtitles , 2011 .

[7]  Thomas S. Huang,et al.  Image processing , 1971 .

[8]  Deva Ramanan,et al.  Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces , 2010, ECCV.

[9]  Thomas Steiner SemWebVid - Making Video a First Class Semantic Web Citizen and a First Class Web Bourgeois , 2010, ISWC Posters&Demos.

[10]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[11]  Raphaël Troncy,et al.  Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia Fragments , 2009, LDOW.

[12]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[13]  Andrew Zisserman,et al.  Efficient Visual Search for Objects in Videos , 2008, Proceedings of the IEEE.

[14]  Jun Xiao,et al.  Thematic video thumbnail selection , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[15]  Steven Verstockt,et al.  Actor recognition for interactive querying and automatic annotation in digital video , 2009, Internet, Multimedia Systems and Applications.

[16]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[17]  Rong Yan,et al.  Video Retrieval Based on Semantic Concepts , 2008, Proceedings of the IEEE.