Video Annotation Based on Kernel Linear Neighborhood Propagation

The insufficiency of labeled training data for representing the distribution of the entire dataset is a major obstacle in automatic semantic annotation of large-scale video database. Semi-supervised learning algorithms, which attempt to learn from both labeled and unlabeled data, are promising to solve this problem. In this paper, a novel graph-based semi-supervised learning method named kernel linear neighborhood propagation (KLNP) is proposed and applied to video annotation. This approach combines the consistency assumption, which is the basic assumption in semi-supervised learning, and the local linear embedding (LLE) method in a nonlinear kernel-mapped space. KLNP improves a recently proposed method linear neighborhood propagation (LNP) by tackling the limitation of its local linear assumption on the distribution of semantics. Experiments conducted on the TRECVID data set demonstrate that this approach outperforms other popular graph-based semi-supervised learning methods for video semantic annotation.

[1]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[2]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[4]  Meng Wang,et al.  Manifold-ranking based video concept detection on large database and feature pool , 2006, MM '06.

[5]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[7]  Xuelong Li,et al.  Negative Samples Analysis in Relevance Feedback , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Meng Wang,et al.  Structure-sensitive manifold ranking for video concept detection , 2007, ACM Multimedia.

[9]  Xuelong Li,et al.  Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm , 2006, IEEE Transactions on Multimedia.

[10]  Helen C. Shen,et al.  Semi-Supervised Classification Using Linear Neighborhood Propagation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[12]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[14]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[15]  Meng Wang,et al.  Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[16]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[17]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[18]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[19]  Xian-Sheng Hua,et al.  Kernel-Based Linear Neighborhood Propagation for Semantic Video Annotation , 2007, PAKDD.

[20]  Tao Mei,et al.  Video annotation based on temporally consistent Gaussian random field , 2007 .