论文信息 - News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003

News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003

We present our new results in news video story segmentation and classification in the context of the TRECVID video retrieval benchmarking event 2003. We applied and extended the maximum entropy statistical model to fuse diverse features effectively from multiple levels and modalities, including visual, audio, and text. We have included various features such as motion, face, music/speech types, prosody, and high-level text segmentation information. The statistical fusion model is used to discover automatically relevant features contributing to the detection of story boundaries. One novel aspect of our method is the use of a feature wrapper to address different types of features - asynchronous, discrete, continuous and delta ones. We also developed several novel features related to prosody. Using the large news video set from the TRECVID 2003 benchmark, we demonstrate satisfactory performance (F1 measure up to 0.76) and, more importantly, observe an interesting opportunity for further improvement.

[1] Jacqueline Vaissière,et al. Language-Independent Prosodic Features , 1983 .

[2] Shih-Fu Chang,et al. A statistical framework for fusing mid-level perceptual features in news story segmentation , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3] Tomas E. Ward,et al. Segmentation and detection at IBM: Hybrid statistical models and two-tiered clustering broadcast new , 2000 .

[4] Shih-Fu Chang,et al. Segmentation, structure detection and summarization of multimedia sequences , 2002 .

[5] Gökhan Tür,et al. Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[6] John D. Lafferty,et al. Statistical Models for Text Segmentation , 1999, Machine Learning.

[7] Shih-Fu Chang,et al. Discovery and fusion of salient multimodal features toward news story segmentation , 2003, IS&T/SPIE Electronic Imaging.