The challenge problem for automated detection of 101 semantic concepts in multimedia

We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic examination of automated concept detection methods, by decomposing the generic video indexing problem into 2 unimodal analysis experiments, 2 multimodal analysis experiments, and 1 combined analysis experiment. For each experiment, we evaluate generic video indexing performance on 85 hours of international broadcast news data, from the TRECVID 2005/2006 benchmark, using a lexicon of 101 semantic concepts. By establishing a minimum performance on each experiment, the challenge problem allows for component-based optimization of the generic indexing issue, while simultaneously offering other researchers a reference for comparison during indexing methodology development. To stimulate further investigations in intermediate analysis steps that inuence video indexing performance, the challenge offers to the research community a manually annotated concept lexicon, pre-computed low-level multimedia features, trained classifier models, and five experiments together with baseline performance, which are all available at http://www.mediamill.nl/challenge/.

[1]  Jean Tague-Sutcliffe,et al.  The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..

[2]  Takeo Kanade,et al.  Informedia Digital Video Library , 1995, CACM.

[3]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[4]  Wolfgang Effelsberg,et al.  On the detection and recognition of television commercials , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[5]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[6]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[7]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[11]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[12]  Brian V. Funt,et al.  A data set for color research , 2002 .

[13]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[14]  Georges Quénot,et al.  CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.

[15]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[16]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[17]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..

[18]  Alexander G. Hauptmann,et al.  Towards a Large Scale Concept Ontology for Broadcast Video , 2004, CIVR.

[19]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.

[20]  Patrick J. Flynn,et al.  Overview of the face recognition grand challenge , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[22]  Cees G. M. Snoek,et al.  Early versus late fusion in semantic video analysis , 2005, MULTIMEDIA '05.

[23]  Marcel Worring,et al.  MediaMill: exploring news video archives based on learned semantics , 2005, MULTIMEDIA '05.

[24]  Ramesh C. Jain,et al.  ACM SIGMM retreat report on future directions in multimedia research , 2005, TOMCCAP.

[25]  Sudeep Sarkar,et al.  The humanID gait challenge problem: data sets, performance, and analysis , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Joshua R. Smith,et al.  A Web-based System for Collaborative Annotation of Large Image and Video Collections , 2005 .

[27]  Alan F. Smeaton,et al.  Large Scale Evaluations of Multimedia Information Retrieval: The TRECVid Experience , 2005, CIVR.

[28]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[29]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Cor J. Veenman,et al.  Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[31]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.