Automatic video annotation by semi-supervised learning with kernel density estimation

Insufficiency of labeled training data is a major obstacle for automatically annotating large-scale video databases with semantic concepts. Existing semi-supervised learning algorithms based on parametric models try to tackle this issue by incorporating the information in a large amount of unlabeled data. However, they are based on a "model assumption" that the assumed generative model is correct, which usually cannot be satisfied in automatic video annotation due to the large variations of video semantic concepts. In this paper, we propose a novel semi-supervised learning algorithm, named Semi Supervised Learning by Kernel Density Estimation (SSLKDE), which is based on a non-parametric method, and therefore the "model assumption" is avoided. While only labeled data are utilized in the classical Kernel Density Estimation (KDE) approach, in SSLKDE both labeled and unlabeled data are leveraged to estimate class conditional probability densities based on an extended form of KDE. We also investigate the connection between SSLKDE and existing graph-based semi-supervised learning algorithms. Experiments prove that SSLKDE significantly outperforms existing supervised methods for video annotation.

[1]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[2]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[3]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[5]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[6]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[7]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[8]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[9]  Meng Wang,et al.  Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[10]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[11]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[12]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[13]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[14]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[15]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[16]  R. Manmatha,et al.  Statistical models for automatic video annotation and retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[18]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  G. W. Snedecor Statistical Methods , 1964 .

[20]  L. Devroye,et al.  An equivalence theorem for L1 convergence of the kernel regression estimate , 1989 .

[21]  Sanjeev Khudanpur,et al.  Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.

[22]  Fabio Gagliardi Cozman,et al.  Semi-supervised Learning of Classifiers : Theory , Algorithms and Their Application to Human-Computer Interaction , 2004 .

[23]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[25]  David J. Hand,et al.  Kernel Discriminant Analysis , 1983 .

[26]  Mingjing Li,et al.  Boosting image orientation detection with indoor vs. outdoor classification , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[27]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[28]  Luc Devroye,et al.  The Consistency of Automatic Kernel Density Estimates , 1984 .

[29]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models and Bayesian Networks , 2003 .