Integrating visual, audio and text analysis for news video

We present a system developed for content-based broadcast news video browsing for home users. There are three main factors that distinguish our work from other similar ones. First, we have integrated the image and audio analysis results in identifying news segments. Second, we use the video OCR technology to detect text from frames, which provides a good source of textual information for story classification when transcripts and close captions are not available. Finally, natural language processing (NLP) technologies are used to perform automated categorization of news stories based on the texts obtained from close caption or video OCR process. Based on these video structure and content analysis technologies, we have developed two advanced video browsers for home users: intelligent highlight player and HTML-based video browser.

[1]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  David C. Gibbon,et al.  Automated authoring of hypermedia documents of video programs , 1995, MULTIMEDIA '95.

[3]  HongJiang Zhang,et al.  Text Area Detection from Video Frames , 2001, IEEE Pacific Rim Conference on Multimedia.

[4]  C.-C. Jay Kuo,et al.  Video content parsing based on combined audio and visual information , 1999, Optics East.

[5]  Alexander G. Hauptmann,et al.  Text, Speech, and Vision for Video Segmentation: The InformediaTM Project , 1995 .

[6]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[7]  Yihong Gong,et al.  Automatic parsing of news video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.