论文信息 - Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video

Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video

In this paper, we present a model based on a multi-resolution, multi-source and multi-modal (M3) bootstrapping framework that exploits knowledge of sub-domains for concept detection in news video. Because the characteristics and distributions of data in different sub-domains are different, we model and analyze the video in each sub-domain separately using a transductive framework. Along with this framework, we propose a "pseudo-Vapnik combined error bound" to tackle the problem of imbalanced distribution of training data in certain segments of sub-domains. For effective fusion of multi-modal features, we utilize multi-resolution inference and constraints to permit evidences from different modal features to support each other. Finally, we employ a bootstrapping technique to leverage unlabeled data to boost the overall system performance. We test our framework by detecting semantic concepts in the TRECVID 2004 dataset. Experimental results demonstrate that our approach is effective.

[1] Xian-Sheng Hua,et al. Transductive Inference with Hierarchical Clustering for Video Annotation , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[2] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[3] Tat-Seng Chua,et al. TRECVID 2005 by NUS PRIS , 2005, TRECVID.

[4] Shih-Fu Chang,et al. Combining text and audio-visual features in video indexing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[6] Tat-Seng Chua,et al. A bootstrapping framework for annotating and retrieving WWW images , 2004, MULTIMEDIA '04.

[7] David A. Hull. Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[8] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[9] Rong Yan,et al. Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[11] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[12] John R. Smith,et al. On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[13] Nicu Sebe,et al. A new analysis of the value of unlabeled data in semi-supervised learning for image retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[14] Rong Yan,et al. Multi-Lingual Broadcast News Retrieval , 2006, TRECVID.

[15] Jun Yang,et al. Finding Person X: Correlating Names with Visual Appearances , 2004, CIVR.

[16] Chin-Yew Lin,et al. Robust automated topic identification , 1997 .