论文信息 - Discriminative model fusion for semantic concept detection and annotation in video

Discriminative model fusion for semantic concept detection and annotation in video

In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept detection task. Results indicate that this technique yields superior performance. We demonstrate this approach further by building classifiers of arbitrary concepts in a score space defined by a pre-deployed set of multimodal concepts. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus.

Harriet J. Nock | Giridharan Iyengar | H. Nock | G. Iyengar

[1] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2] Shih-Fu Chang,et al. VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[3] John R. Smith,et al. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[4] Giridharan Iyengar,et al. Models for automatic classification of video sequences , 1997, Electronic Imaging.

[5] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[6] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[7] Haim H. Permuter,et al. IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[8] Nuno Vasconcelos,et al. Bayesian modeling of video editing and structure: semantic features for video summarization and browsing , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[9] C.-C. Jay Kuo,et al. Integrated approach to multimodal media content analysis , 1999, Electronic Imaging.

[10] John R. Smith,et al. User-trainable video annotation using multimodal cues , 2003, SIGIR '03.