Summarizing Audiovisual Contents of a Video Program

In this paper, we focus on video programs that are intended to disseminate information and knowledge such as news, documentaries, seminars, etc, and present an audiovisual summarization system that summarizes the audio and visual contents of the given video separately, and then integrating the two summaries with a partial alignment. The audio summary is created by selecting spoken sentences that best present the main content of the audio speech while the visual summary is created by eliminating duplicates/redundancies and preserving visually rich contents in the image stream. The alignment operation aims to synchronize each spoken sentence in the audio summary with its corresponding speaker′s face and to preserve the rich content in the visual summary. A Bipartite Graph-based audiovisual alignment algorithm is developed to efficiently find the best alignment solution that satisfies these alignment requirements. With the proposed system, we strive to produce a video summary that: (1) provides a natural visual and audio content overview, and (2) maximizes the coverage for both audio and visual contents of the original video without having to sacrifice either of them.

[1]  W. Press,et al.  Numerical Recipes in Fortran: The Art of Scientific Computing.@@@Numerical Recipes in C: The Art of Scientific Computing. , 1994 .

[2]  Dragutin Petkovic,et al.  CueVideo: automated multimedia indexing and retrieval , 1999, MULTIMEDIA '99.

[3]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[4]  Bernard Mérialdo,et al.  Generating summaries of multi-episode video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[5]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  Yoshinobu Tonomura,et al.  VideoMAP and VideoSpaceIcon: tools for anatomizing video content , 1993, INTERCHI.

[8]  Boon-Lock Yeo,et al.  Video browsing using clustering and scene transitions on compressed sequences , 1995, Electronic Imaging.

[9]  Andreas Girgensohn,et al.  Time-Constrained Keyframe Selection Technique , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[10]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[11]  Takafumi Miyatake,et al.  IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system , 1991, CHI.

[12]  A. Murat Tekalp,et al.  Multiscale content extraction and representation for video indexing , 1997, Other Conferences.

[13]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[14]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[15]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[16]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.