Multi-Graph Semi-Supervised Learning for Video Semantic Feature Extraction

This paper proposes a video semantic feature extraction approach based on multi-graph semi-supervised learning, which aims to simultaneously deal with the insufficiency of training data and the curse of dimensionality. In contrast to traditional graph-based semi-supervised learning, which generates graph from high-dimensional low-level features, we separate the original low-level features into multiple modalities with minimum correlations, and thus multiple graphs are obtained from these modalities. This way can tackle the curse of dimensionality brought by the high-dimensional feature space. We then propose a criterion to optimally fuse these graphs based on the pairwise relationships among training samples, and implement semi-supervised learning on the fused graph. Experimental results have demonstrated the effectiveness of the proposed approach.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[3]  Wei-Ying Ma,et al.  Graph based multi-modality learning , 2005, ACM Multimedia.

[4]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[5]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[6]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[7]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[8]  Meng Wang,et al.  Manifold-ranking based video concept detection on large database and feature pool , 2006, MM '06.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[11]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.