Correlative multi-label video annotation

Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Shih-Fu Chang,et al.  Active Context-Based Concept Fusionwith Partial User Labels , 2006, 2006 International Conference on Image Processing.

[3]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[4]  Milind R. Naphade Statistical techniques in video data management , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[5]  Refractor Vision , 2000, The Lancet.

[6]  Jiebo Luo,et al.  Multilabel machine learning and its application to semantic scene classification , 2003, IS&T/SPIE Electronic Imaging.

[7]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[8]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[9]  Gerhard Winkler,et al.  Image analysis, random fields and dynamic Monte Carlo methods: a mathematical introduction , 1995, Applications of mathematics.

[10]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[11]  Wei-Hao Lin,et al.  Confounded Expectations: Informedia at TRECVID 2004 , 2004, TRECVID.

[12]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[13]  Gerhard Winkler,et al.  Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction , 2002 .

[14]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[15]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[16]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[17]  Rong Yan,et al.  Multi-Lingual Broadcast News Retrieval , 2006, TRECVID.

[18]  Leszek Wojnar,et al.  Image Analysis , 1998 .

[19]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[20]  Jawaharlal Karmeshu,et al.  Entropy Measures, Maximum Entropy Principle and Emerging Applications , 2003 .

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  T. Stephenson Image analysis , 1992, Nature.