Semantic segmentation of video collections using boosted random fields

Multimedia documentalists need effective tools to organize and search into large video collections. Semantic video structuring consists in automatically extracting from the raw data the inner structure of a video collection. This high-level information if automatically extracted would provide important meta information enabling the development of an important new range of applications to browse and search video collections. In this paper, we present the feature extraction process providing a compact description of the audio, visual and text modalities. To reach the semantic level required, a contextual model is then proposed: it is a complex model which takes into account not only the link between features and labels but also the compatibility between labels associated with different modalities for improved consistency of the results. Boosted Random Fields are used to learn these relationships. It provides an iterative optimization framework to learn the model parameters and uses the abilities of boosting to reduce classification errors, to avoid over-fitting and to achieve the task of feature selection. We experiment using the TRECvid corpus and show results that validate the approach over existing studies.