Audio-video summarization of TV news using speech recognition and shot change detection

This paper presents an approach to audio-video summarization of TV news to provide concise information about the content while preserves the essential message of the original. In this study, anchor speech and field report videos are considered separately. First, speech signal is automatically recognized as transcripts and a confidence measure considering syntactic and semantic relations is used to estimate the reliability of words. For video skimming, RGB color histogram difference is adopted to segment video shots and evaluate the smoothness of images concatenation. As a result, the extracted anchor speech and the field report image sequence of TV news are aggregated into a summarization output. The experimental results indicate that the proposed approach effectively extracts important speech segments and gives a concise video sequence.

[1]  Chung-Hsien Wu,et al.  Spoken document summarization using topic-related corpus and semantic dependency grammar , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[2]  Sadaoki Furui,et al.  A new approach to automatic speech summarization , 2003, IEEE Trans. Multim..

[3]  SangKeun Lee,et al.  An application for interactive video abstraction , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Rainer Lienhart,et al.  Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.

[5]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[6]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.