A transductive multi-label learning approach for video concept detection

In this paper, we address two important issues in the video concept detection problem: the insufficiency of labeled videos and the multiple labeling issue. Most existing solutions merely handle the two issues separately. We propose an integrated approach to handle them together, by presenting an effective transductive multi-label classification approach that simultaneously models the labeling consistency between the visually similar videos and the multi-label interdependence for each video. We compare the performance between the proposed approach and several representative transductive and supervised multi-label classification approaches for the video concept detection task over the widely used TRECVID data set. The comparative results demonstrate the superiority of the proposed approach.

[1]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[4]  Brendan J. Frey,et al.  A comparison of algorithms for inference and learning in probabilistic graphical models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Meng Wang,et al.  Automatic video annotation by semi-supervised learning with kernel density estimation , 2006, MM '06.

[6]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[7]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[8]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[9]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[10]  Ulf Brefeld,et al.  Semi-supervised learning for structured output variables , 2006, ICML.

[11]  Tao Mei,et al.  Graph-based semi-supervised learning with multi-label , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[12]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[13]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Gang Chen,et al.  Semi-supervised Multi-label Learning by Solving a Sylvester Equation , 2008, SDM.

[15]  Meng Wang,et al.  Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[16]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[17]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[18]  Helen C. Shen,et al.  Linear Neighborhood Propagation and Its Applications , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[20]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[21]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Solomon Kullback,et al.  Approximating discrete probability distributions , 1969, IEEE Trans. Inf. Theory.

[23]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[24]  Dale Schuurmans,et al.  Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields , 2006, NIPS.

[25]  Zhi-Hua Zhou,et al.  Learning with Unlabeled Data and Its Application to Image Retrieval , 2006, PRICAI.

[26]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[27]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[28]  Meng Wang,et al.  Structure-sensitive manifold ranking for video concept detection , 2007, ACM Multimedia.

[29]  Mikhail Belkin,et al.  Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[30]  Shih-Fu Chang,et al.  Active Context-Based Concept Fusionwith Partial User Labels , 2006, 2006 International Conference on Image Processing.

[31]  Tao Mei,et al.  Video annotation based on temporally consistent Gaussian random field , 2007 .

[32]  Xian-Sheng Hua,et al.  Transductive multi-label learning for video concept detection , 2008, MIR '08.

[33]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[34]  Rong Yan,et al.  Mining Relationship Between Video Concepts using Probabilistic Graphical Models , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[35]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[36]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Alexander Zien,et al.  Transductive support vector machines for structured variables , 2007, ICML '07.