Semi-supervised multi-instance multi-label learning for video annotation task

Traditional approaches for automatic video annotation usually represent one video clip with a flat feature vector, neglecting the fact that video data contain natural structures. It is also noteworthy that a video clip is often relevant to multiple concepts. Indeed, the video annotation task is inherently a Multi-Instance Multi-Label learning (MIML) problem. Considering that manually annotating videos is labor-intensive and time-consuming, this paper proposes a semi-supervised MIML approach, SSMIML, which is able to exploit abundant unannotated videos to help improve the annotation performance. This approach takes label correlations into account, and enforces similar instances to share similar multi-labels. Evaluation on TREVID 2005 show that the proposed approach outperforms several state-of-the-art methods.

[1]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[2]  Zhi-Hua Zhou,et al.  Ensemble multi-instance multi-label learning approach for video annotation task , 2011, ACM Multimedia.

[3]  Meng Wang,et al.  Semi-supervised kernel density estimation for video annotation , 2009, Comput. Vis. Image Underst..

[4]  Li Yu-Feng,et al.  Regularized Semi-Supervised Multi-Label Learning , 2012 .

[5]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[6]  Tao Mei,et al.  Multi-Layer Multi-Instance Learning for Video Concept Detection , 2008, IEEE Transactions on Multimedia.

[7]  Tao Mei,et al.  Multi-layer multi-instance kernel for video concept detection , 2007, ACM Multimedia.

[8]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[10]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[11]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[12]  Meng Wang,et al.  Structure-sensitive manifold ranking for video concept detection , 2007, ACM Multimedia.

[13]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[14]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[15]  Thomas Hofmann,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2007 .

[16]  Sally A. Goldman,et al.  MISSL: multiple-instance semi-supervised learning , 2006, ICML.

[17]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[18]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[19]  Shuang-Hong Yang,et al.  Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora , 2009, NIPS.

[20]  John R. Smith,et al.  A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[22]  Meng Wang,et al.  Correlative multilabel video annotation with temporal kernels , 2008, TOMCCAP.

[23]  Changshui Zhang,et al.  Instance-level Semisupervised Multiple Instance Learning , 2008, AAAI.

[24]  Shih-Fu Chang,et al.  Structure analysis of soccer video with domain knowledge and hidden Markov models , 2004, Pattern Recognit. Lett..

[25]  Nam Nguyen,et al.  A New SVM Approach to Multi-instance Multi-label Learning , 2010, 2010 IEEE International Conference on Data Mining.

[26]  De Xu,et al.  Transductive Multi-Instance Multi-Label learning algorithm with application to automatic image annotation , 2010, Expert Syst. Appl..

[27]  Zhi-Hua Zhou,et al.  Learnability of multi-instance multi-label learning , 2012 .

[28]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Jianping Fan,et al.  Semantic video classification by integrating flexible mixture model with adaptive EM algorithm , 2003, MIR '03.