Transductive multi-distance learning for video search

Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled training data in many real-world application areas, such as video annotation. As a significant factor of these algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches, the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly, in this paper, a novel framework for video annotation, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed. This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data to improve the estimation of pair-wise similarity. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the benchmark TRECVID data set.

[1]  Tao Mei,et al.  Anisotropic Manifold Ranking for Video Annotation , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[2]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[3]  Meng Wang,et al.  Video annotation by graph-based learning with neighborhood similarity , 2007, ACM Multimedia.

[4]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[5]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Xian-Sheng Hua,et al.  Transductive multi-label learning for video concept detection , 2008, MIR '08.

[7]  Meng Wang,et al.  Automatic video annotation by semi-supervised learning with kernel density estimation , 2006, MM '06.

[8]  B. N. Chatterji,et al.  Comparison of similarity metrics for texture image retrieval , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[9]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[10]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[11]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[12]  Meng Wang,et al.  Manifold-ranking based video concept detection on large database and feature pool , 2006, MM '06.

[13]  Meng Wang,et al.  Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[16]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[17]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[18]  Jingrui He,et al.  Generalized Manifold-Ranking-Based Image Retrieval , 2006, IEEE Transactions on Image Processing.

[19]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.