LabelMovie: Semi-supervised machine annotation tool with quality assurance and crowd-sourcing options for videos

For multiple reasons, the automatic annotation of video recordings is challenging. The amount of database video instances to be annotated is huge, tedious manual labeling sessions are required, the multi-modal annotation needs exact information of space, time, and context, and the different labeling opportunities require special agreements between annotators, and alike. Crowd-sourcing with quality assurance by experts may come to the rescue here. We have developed a special tool: individual experts can annotate videos over the Internet, their work can be joined and filtered, the annotated material can be evaluated by machine learning methods, and automated annotation may start according to a predefined confidence level. A relatively small number of manually labeled instances may efficiently bootstrap the machine annotation procedure. We present the new mathematical concepts and algorithms for semi-supervised induction and the corresponding manual annotation tool which features special visualization methods for crowd-sourced users. A special feature is that the annotation tool is usable for users not familiar with machine learning methods; for example, we allow them to ignite and handle a complex bootstrapping process.

[1]  David A. Forsyth,et al.  Representation Learning , 2015, Computer.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  András Lörincz,et al.  3D shape estimation in video sequences provides high precision evaluation of facial expressions , 2012, Image Vis. Comput..

[6]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[7]  A. Lörincz,et al.  Innovative Assessment Technologies in Educational Games Designed for Young Students , 2012 .

[8]  Emilly Budlong Multimedia Information Extraction , 2007 .

[9]  Chong-Wah Ngo,et al.  Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation , 2012, IEEE Transactions on Image Processing.

[10]  László Böszörményi,et al.  Smart Video Browsing with Augmented Navigation Bars , 2013, MMM.

[11]  Jeffrey F. Cohn,et al.  Towards entertaining and efficient educational games , 2013, NIPS 2013.

[12]  Huang Lijuan,et al.  Content-Based Multimedia Information Retrieval , 2000 .

[13]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[14]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[15]  Alberto Del Bimbo,et al.  Video Annotation and Retrieval Using Ontologies and Rule Learning , 2010, IEEE MultiMedia.

[16]  Daniel Sonntag,et al.  Collaborative Multimodality , 2012, KI - Künstliche Intelligenz.

[17]  Fernando De la Torre,et al.  Continuous AU intensity estimation using localized, sparse facial feature space , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[18]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Takeo Kanade,et al.  Emotional Expression Classification Using Time-Series Kernels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.