Video story segmentation with multi-modal features: experiments on TRECvid 2003

This paper describes the first steps of CLIPS/IMAG on the TREC video story segmentation task. We mostly describe the multi-modal features used and their respective performance for the story segmentation task. These features are based on the audio, video and text modalities. The preliminary system, which has the advantage to be relatively free with respect to the use of training data, is also presented in this paper. First experiments on the TRECVID 2003 evaluation set lead to a recall rate of 0.613 and a precision rate of 0.467. We plan to participate to the official TRECVID 2004 story segmentation task with this system

[1]  Eric Allamanche,et al.  Content-based Identification of Audio Material Using MPEG-7 Low Level Description , 2001, ISMIR.

[2]  Tat-Seng Chua,et al.  Two-Level Multi-Modal Framework for News Story Segmentation of Large Video Corpus , 2003 .

[3]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[4]  Georges Quénot,et al.  CLIPS at TRECVID : Shot Boundary Detection and Feature Detection , 2003, TRECVID.

[5]  Julien Pinquier,et al.  Jingle detection and identification in audio documents , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Shih-Fu Chang,et al.  News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003 , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[8]  Jean-François Bonastre,et al.  The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Sylvain Meignier,et al.  The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..