Story segmentation and detection of commercials in broadcast news video

The Informedia Digital Library Project allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract sufficiently accurate speech recognition transcripts from the broadcast audio and that we can segment the broadcast into video paragraphs, or stories, that are useful for information retrieval. In previous papers we have shown that speech recognition is sufficient for information retrieval of pre-segmented video news stories. We now address the issue of segmentation and demonstrate that a fully automatic system can extract story boundaries using available audio, video and closed-captioning cues. The story segmentation step for the Informedia Digital Video Library splits full-length news broadcasts into individual news stories. During this phase the system also labels commercials as separate "stories". We explain how the Informedia system takes advantage of the closed captioning frequently broadcast with the news, how it extracts timing information by aligning the closed-captions with the result of the speech recognition, and how the system integrates closed-caption cues with the results of image and audio processing.

[1]  Very Large Corpora Empirical Methods in Natural Language Processing , 1999 .

[2]  Michael J. Witbrock,et al.  Artificial intelligence techniques in the interface to a Digital Video Library , 1997, CHI Extended Abstracts.

[3]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Alexander G. Hauptmann,et al.  Speech recognition for a digital video library , 1998 .

[6]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[7]  Michael J. Witbrock,et al.  Speech Recognition for a Digital Video Library , 1998, J. Am. Soc. Inf. Sci..

[8]  Alexander I. Rudnicky,et al.  Speech for multimedia information retrieval , 1995, UIST '95.

[9]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[10]  John D. Lafferty,et al.  Cheating with imperfect transcripts , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Andrew Merlino,et al.  Segmentation, Content Extraction and Visualization of Broadcast News Video using Multistream Analysis , 1997 .

[12]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[13]  Michael J. Witbrock,et al.  Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents , 1997, DL '97.

[14]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[15]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[16]  Mark T. Maybury,et al.  Broadcast news navigation using story segmentation , 1997, MULTIMEDIA '97.

[17]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[18]  Ramesh C. Jain,et al.  Digital video segmentation , 1994, MULTIMEDIA '94.

[19]  Mark T. Maybury,et al.  Towards content-based browsing of broadcast news video , 1997 .

[20]  Boon-Lock Yeo,et al.  Extracting story units from long programs for video browsing and navigation , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[21]  Mei-Yuh Hwang,et al.  Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  John D. Lafferty,et al.  Text Segmentation Using Exponential Models , 1997, EMNLP.

[23]  Yukinobu Taniguchi,et al.  An intuitive and efficient access interface to real-time incoming video based on automatic indexing , 1995, MULTIMEDIA '95.

[24]  Christos Faloutsos,et al.  VideoTrails: representing and visualizing structure in video sequences , 1997, MULTIMEDIA '97.

[25]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[27]  Stephen W. Smoliar,et al.  Video parsing, retrieval and browsing: an integrated and content-based solution , 1997, MULTIMEDIA '95.