论文信息 - Semantic segmentation of video collections using boosted random fields

Semantic segmentation of video collections using boosted random fields

Multimedia documentalists need effective tools to organize and search into large video collections. Semantic video structuring consists in automatically extracting from the raw data the inner structure of a video collection. This high-level information if automatically extracted would provide important meta information enabling the development of an important new range of applications to browse and search video collections. In this paper, we present the feature extraction process providing a compact description of the audio, visual and text modalities. To reach the semantic level required, a contextual model is then proposed: it is a complex model which takes into account not only the link between features and labels but also the compatibility between labels associated with different modalities for improved consistency of the results. Boosted Random Fields are used to learn these relationships. It provides an iterative optimization framework to learn the model parameters and uses the abilities of boosting to reduce classification errors, to avoid over-fitting and to achieve the task of feature selection. We experiment using the TRECvid corpus and show results that validate the approach over existing studies.

Thierry Pun | Stéphane Marchand-Maillet | Bruno Janvier | Eric Bruno

[1] Gal Chechik,et al. Extracting Relevant Structures with Side Information , 2002, NIPS.

[2] Naftali Tishby,et al. Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[3] John D. Lafferty,et al. Statistical Models for Text Segmentation , 1999, Machine Learning.

[4] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[5] Thierry Pun,et al. Information-Theoretic Framework for The Joint Temporal Partionning and Representation of Video Data , 2003 .

[6] Qi Tian,et al. A Two-Level Multi-Modal Approach for Story Segmentation of Large News Video Corpus , 2003, TRECVID.

[7] Shih-Fu Chang,et al. Discovery and fusion of salient multimodal features toward news story segmentation , 2003, IS&T/SPIE Electronic Imaging.

[8] Antonio Torralba,et al. Contextual Models for Object Detection Using Boosted Random Fields , 2004, NIPS.

[9] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[10] Shih-Fu Chang,et al. Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[11] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.