Transductive multi-label learning for video concept detection

Transductive video concept detection is an effective way to handle the lack of sufficient labeled videos. However, another issue, the multi-label interdependence, is not essentially addressed in the existing transductive methods. Most solutions only applied the transductive single-label approach to detect each individual concept separately, but ignoring the concept relation, or simply imposed the smoothness assumption over the multiple labels for each video, without indeed exploring the interdependence between the concepts. On the other hand, the semi-supervised extension of supervised multi-label classifiers, such as correlative multi-label support vector machines, is usually intractable and hence impractical due to the quite expensive computational cost. In this paper, we propose an effective transductive multi-label classification approach, which simultaneously models the labeling consistency between the visually similar videos and the multi-label interdependence for each video in an integrated framework. We compare the performance between the proposed approach and several representative transductive single-label and supervised multi-label classification approaches for the video concept detection task over the widely-used TRECVID data set. The comparative results demonstrate the superiority of the proposed approach.

[1]  Dale Schuurmans,et al.  Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields , 2006, NIPS.

[2]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[3]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[4]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[5]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Gang Chen,et al.  Semi-supervised Multi-label Learning by Solving a Sylvester Equation , 2008, SDM.

[7]  Meng Wang,et al.  Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[8]  Solomon Kullback,et al.  Approximating discrete probability distributions , 1969, IEEE Trans. Inf. Theory.

[9]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[10]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Alexander Zien,et al.  Transductive support vector machines for structured variables , 2007, ICML '07.

[13]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[14]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[15]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[16]  Rong Yan,et al.  Mining Relationship Between Video Concepts using Probabilistic Graphical Models , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[17]  Ulf Brefeld,et al.  Semi-supervised learning for structured output variables , 2006, ICML.

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[19]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[20]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[21]  Tao Mei,et al.  Graph-based semi-supervised learning with multi-label , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[22]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[23]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[24]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[25]  Meng Wang,et al.  Automatic video annotation by semi-supervised learning with kernel density estimation , 2006, MM '06.

[26]  Mikhail Belkin,et al.  Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[27]  Shih-Fu Chang,et al.  Active Context-Based Concept Fusionwith Partial User Labels , 2006, 2006 International Conference on Image Processing.

[28]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[29]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[30]  Meng Wang,et al.  Structure-sensitive manifold ranking for video concept detection , 2007, ACM Multimedia.