Learning to annotate video databases

Model-based approach to video retrieval requires ground-truth data for training the models. This leads to the development of video annotation tools that allow users to annotate each shot in the video sequence as well as to identify and label scenes, events, and objects by applying the labels at the shot-level. The annotation tool considered here also allows the user to associate the object-labels with an individual region in a key-frame image. However, the abundance of video data and diversity of labels make annotation a difficult and overly expensive task. To combat this problem, we formulate the task of annotation in the framework of supervised training with partially labeled data by viewing it as an exercise in active learning. In this scenario, one first trains a classifier with a small set of labeled data, and subsequently updates the classifier by selecting the most informative, or most uncertain subset of the available data-set. Consequently, propagation of labels to yet unlabeled data is automatically achieved as well. The purpose of this paper is primarily twofold. The first is to describe a video annotation tool that has been developed for the purpose of annotating generic video sequences in the context of a recent video-TREC benchmarking exercise. The tool is semi-automatic in that it automatically propagates labels to similar shots, which requires the user to confirm or reject the propagated labels. The second purpose is to show how active learning strategy can be potentially implemented in this context to further improve the performance of the annotation tool. While many versions of active learning could be thought of, we specifically report results on experiments with support vector machine classifiers with polynomial kernels.

[1]  John R. Smith,et al.  Integrating Features, Models, and Semantics for TREC Video Retrieval , 2001, TREC.

[2]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  Dragutin Petkovic,et al.  Using audio time scale modification for video browsing , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[5]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[6]  Tong Zhang,et al.  Active learning using adaptive resampling , 2000, KDD '00.

[7]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[8]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[9]  Tsuhan Chen,et al.  Active Learning for Information Retrieval : Using 3 D Models As An Example , 2001 .

[10]  Stan Z. Li,et al.  Extraction of feature subspaces for content-based retrieval using relevance feedback , 2001, MULTIMEDIA '01.

[11]  Daphne Koller,et al.  Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[14]  Thomas S. Huang,et al.  Image classification using a set of labeled and unlabeled images , 2000, SPIE Optics East.