Automatically labeling video data using multi-class active learning

Labeling video data is an essential prerequisite for many vision applications that depend on training data, such as visual information retrieval, object recognition, and human activity modelling. However, manually creating labels is not only time-consuming but also subject to human errors, and eventually, becomes impossible for a very large amount of data (e.g. 24/7 surveillance video). To minimize the human effort in labeling, we propose a unified multiclass active learning approach for automatically labeling video data. We include extending active learning from binary classes to multiple classes and evaluating several practical sample selection strategies. The experimental results show that the proposed approach works effectively even with a significantly reduced amount of labeled data. The best sample selection strategy can achieve more than a 50% error reduction over random sample selection.

[1]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[2]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[3]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[4]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[5]  A. N. Rajagopalan,et al.  Gait-based recognition of humans using continuous HMMs , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[6]  Ramesh C. Jain,et al.  A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video , 2002, Pattern Recognit..

[7]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[9]  Alexander H. Waibel,et al.  Multimodal people ID for a multimedia meeting browser , 1999, MULTIMEDIA '99.

[10]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[11]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[12]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[13]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[14]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[15]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[16]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[17]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[19]  Vapnik,et al.  SVMs for Histogram Based Image Classification , 1999 .

[20]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.