Evaluation of active learning strategies for video indexing

In this paper, we compare active learning strategies for indexing concepts in video shots. Active learning is simulated using subsets of a fully annotated dataset instead of actually calling for user intervention. Training is done using the collaborative annotation of 39 concepts of the TRECVID 2005 campaign. Performance is measured on the 20 concepts selected for the TRECVID 2006 concept detection task. The simulation allows exploring the effect of several parameters: the strategy, the annotated fraction of the dataset, the number of iterations and the relative difficulty of concepts. Three strategies were compared. The first two respectively select the most probable and the most uncertain samples. The third one is a random choice. For easy concepts, the "most probable" strategy is the best one when less than 15% of the dataset is annotated and the "most uncertain" strategy is the best one when 15% or more of the dataset is annotated. The "most probable" and "most uncertain" strategies are roughly equivalent for moderately difficult and difficult concepts. In all cases, the maximum performance is reached when 12 to 15% of the whole dataset is annotated.

[1]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[2]  Marcel Worring,et al.  Learning rich semantics from news video archives by style analysis , 2006, TOMCCAP.

[3]  Marin Ferecatu,et al.  Semantic interactive image retrieval combining visual and conceptual content description , 2007, Multimedia Systems.

[4]  G. Quénot,et al.  CLIPS-LSR Experiments at TRECVID 2006 , 2006, TRECVID.

[5]  Joshua R. Smith,et al.  A Web-based System for Collaborative Annotation of Large Image and Video Collections , 2005 .

[6]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[7]  D. Angluin Queries and Concept Learning , 1988 .

[8]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[9]  Jeff Z. Pan,et al.  Multimedia annotations on the semantic Web , 2006, IEEE Multimedia.

[10]  Fabrice Souvannavong Partition sampling for active video database annotation , 2004 .

[11]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[12]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[13]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[14]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[15]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[16]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[17]  Li-Rong Dai,et al.  Video Annotation by Active Learning and Cluster Tuning , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[18]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[19]  QU GeorgesM Computation of Optical Flow using Dynamic Programming , 1996 .

[20]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[21]  Stéphane Ayache,et al.  Context-Based Conceptual Image Indexing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Matthieu Cord,et al.  A comparison of active classification methods for content-based image retrieval , 2004, CVDB '04.

[23]  Peter Willett,et al.  Readings in information retrieval , 1997 .