DFKI and University of Kaiserslautern participation at TRECVID 2010 - Semantic Indexing Task

Run No. Run ID Run Description infMAP (%) training on IACC data 1 F A DFKI-MADM 3 SIFT visual words, Color Correlograms and Face-Detection separately trained, late fusion of SVMs scores 5.0 2 F A DFKI-MADM 4 SIFT visual words with SVMs 4.4 combined training on YouTube 3 F D DFKI-MADM 1 SIFT visual words, Color Correlograms and Face-Detection separately trained, late fusion of SVMs scores 2.1 4 F B DFKI-MADM 2 SIFT visual words with SVMs 1.3 This paper describes the TRECVID 2010 participation of the DFKI-MADM team in the semantic indexing task. This years participation was dominated by two aspects, a new dataset and a large-sized vocabulary of 130 concepts. For the annual TRECVID benchmark this means to scale label annotation efforts to significant larger concept vocabularies and datasets. Aiming to reduce this effort, our intention is to automatically acquire training data from online video portals like YouTube and to use tags, associated with each video, as concept labels. Results for the evaluated subset of concepts show similarly to last year’s participation [3], that effects like label noise and domain change lead to a performance loss (infMAP 2.1% and 1.3%) as compared to purely TRECVID trained concept detectors (infMAP 5.0% and 4.4%). Nevertheless, for individual concepts like “demonstration or protest” or “bus” automatic learning from online video portals is a valid alternative to expected labeled training datasets. Furthermore, the results also show that fusion of multiple features helps to improves detection precision.

[1]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[2]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[3]  Markus Koch,et al.  Learning automatic concept detectors from online video , 2010, Comput. Vis. Image Underst..

[4]  Markus Koch,et al.  Learning TRECVID'08 High-Level Features from YouTube , 2008, TRECVID.

[5]  Adrian Ulges,et al.  Keyframe Extraction for Video Tagging & Summarization , 2008, Informatiktage.

[6]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[7]  Markus Koch,et al.  TubeTagger - YouTube-based Concept Detection , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[8]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[9]  Rong Yan,et al.  How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[10]  Jun Yang,et al.  A framework for classifier adaptation and its applications in concept detection , 2008, MIR '08.

[11]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Adrian Ulges,et al.  Visual Concept Learning from Weakly Labeled Web Videos , 2010, Video Search and Mining.

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Marcel Worring,et al.  Annotating images by harnessing worldwide user-tagged photos , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Damian Borth,et al.  DFKI-IUPR participation in TRECVID'09 High-level Feature Extraction Task , 2009, TRECVID.