Automatic Story Segmentation for TV News Video Using Multiple Modalities

While video content is often stored in rather large files or broadcasted in continuous streams, users are often interested in retrieving only a particular passage on a topic of interest to them. It is, therefore, necessary to split video documents or streams into shorter segments corresponding to appropriate retrieval units. We propose here a method for the automatic segmentation of TV news videos into stories. A-multiple-descriptor based segmentation approach is proposed. The selected multimodal features are complementary and give good insights about story boundaries. Once extracted, these features are expanded with a local temporal context and combined by an early fusion process. The story boundaries are then predicted using machine learning techniques. We investigate the system by experiments conducted using TRECVID 2003 data and protocol of the story boundary detection task, and we show that the proposed approach outperforms the state-of-the-art methods while requiring a very small amount of manual annotation.

[1]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Tat-Seng Chua,et al.  Story Boundary Detection in News Video using Global Rule Induction Technique , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[4]  Georges Quénot,et al.  CLIPS at TRECVID : Shot Boundary Detection and Feature Detection , 2003, TRECVID.

[5]  Paul Over,et al.  TRECVID-An Overview , 2003, TRECVID.

[6]  Wei-Hao Lin,et al.  Clever Clustering vs . Simple Speed-Up for Summarizing BBC Rushes , 2007 .

[7]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[8]  Frank Hopfgartner,et al.  Split and Merge Based Story Segmentation in News Videos , 2009, ECIR.

[9]  Dominique Fohr,et al.  Speaker diarization using normalized cross likelihood ratio , 2007, INTERSPEECH.

[10]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Wei-Hao Lin,et al.  Clever clustering vs. simple speed-up for summarizing rushes , 2007, TVS '07.

[12]  Chin-Hui Lee,et al.  A detection-based approach to broadcast news video story segmentation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  John M. Gauch,et al.  Real Time Video Scene Detection and Classification , 1999, Inf. Process. Manag..

[14]  Shih-Fu Chang,et al.  Story boundary detection in large broadcast news video archives: techniques, experience and trends , 2004, MULTIMEDIA '04.

[15]  Jenny Benois-Pineau,et al.  The ARGOS campaign: Evaluation of video analysis and indexing tools , 2007, Signal Process. Image Commun..

[16]  Chin-Hui Lee,et al.  A Multi-Modal Approach to Story Segmentation for News Video , 2003, World Wide Web.

[17]  Georges Quénot,et al.  From Text Detection in Videos to Person Identification , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[18]  Bernard Mérialdo,et al.  Split-screen dynamically accelerated video summaries , 2007, TVS '07.

[19]  Frank Hopfgartner,et al.  TV News Story Segmentation Based on Semantic Coherence and Content Similarity , 2010, MMM.

[20]  Bernard Mérialdo,et al.  Automatic evaluation method for rushes summary content , 2009, 2009 IEEE International Conference on Multimedia and Expo.