Ensemble multi-instance multi-label learning approach for video annotation task

Automatic video annotation is an important ingredient for video indexing, browsing, and retrieval. Traditional studies represent one video clip with a flat feature vector; however, video data usually has natural structure. Moreover, a video clip is generally relevant to multiple concepts. Indeed, the video annotation task is inherently a Multi-Instance Multi-Label (MIML) learning problem. In this paper, we propose the En-MIMLSVM approach for the video annotation task. It considers the class imbalance and long time training problems of most video annotation tasks. In addition, a temporally consistent weighted multi-instance kernel is developed to take into account both the temporal consistency in video data and the significance of instances of different levels in pyramid representation. The En-MIMLSVM is evaluated on TRECVID 2005 data set, and the results show that it outperforms several state-of-the-art methods.

[1]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Tao Mei,et al.  Temporally Consistent Gaussian Random Field for Video Semantic Analysis , 2007, 2007 IEEE International Conference on Image Processing.

[3]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  John R. Smith,et al.  A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Jun Yang,et al.  Exploring temporal consistency for video analysis and retrieval , 2006, MIR '06.

[6]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[7]  Thomas Hofmann,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2007 .

[8]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[9]  Zhi-Hua Zhou,et al.  Multi-instance learning by treating instances as non-I.I.D. samples , 2008, ICML '09.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[12]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[13]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[14]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[15]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[16]  Nam Nguyen,et al.  A New SVM Approach to Multi-instance Multi-label Learning , 2010, 2010 IEEE International Conference on Data Mining.

[17]  Sanjeev Khudanpur,et al.  Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.

[18]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Tao Mei,et al.  Multi-Layer Multi-Instance Learning for Video Concept Detection , 2008, IEEE Transactions on Multimedia.

[20]  Meng Wang,et al.  Correlative multilabel video annotation with temporal kernels , 2008, TOMCCAP.

[21]  Zhi-Hua Zhou,et al.  M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[23]  Zhi-Hua Zhou,et al.  Learning a distance metric from multi-instance multi-label data , 2009, CVPR.

[24]  Shuang-Hong Yang,et al.  Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora , 2009, NIPS.

[25]  Zhi-Hua Zhou,et al.  MIML: A Framework for Learning with Ambiguous Objects , 2008, ArXiv.