Automated video program summarization using speech transcripts

Compact representations of video data greatly enhances efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essential message. We propose a method to automatically generate video summaries using transcripts obtained by automatic speech recognition. We divide the full program into segments based on pause detection and derive a score for each segment, based on the frequencies of the words and bigrams it contains. Then, a summary is generated by selecting the segments with the highest score to duration ratios while at the same time maximizing the coverage of the summary over the full program. We developed an experimental design and a user study to judge the quality of the generated video summaries. We compared the informativeness of the proposed algorithm with two other algorithms for three different programs. The results of the user study demonstrate that the proposed algorithm produces more informative summaries than the other two algorithms

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[3]  John R. Kender,et al.  A method and browser for cross-referenced video summaries , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[4]  Edward J. Delp,et al.  Automated video summarization using speech transcripts , 2001, IS&T/SPIE Electronic Imaging.

[5]  David Pisinger,et al.  An expanding-core algorithm for the exact 0-1 knapsack problem , 1995 .

[6]  Rainer Lienhart Dynamic video summarization of home video , 1999, Electronic Imaging.

[7]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[8]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[9]  Qian Huang,et al.  Automated generation of news content hierarchy by integrating audio, video, and text information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Kien A. Hua,et al.  An efficient technique for summarizing videos using visual contents , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[11]  Alexander G. Hauptmann,et al.  Adjustable filmstrips and skims as abstractions for a digital video library , 1999, Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries.

[12]  Andreas Dieberger,et al.  Hierarchical brushing in a collection of video data , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[13]  Shingo Uchihashi,et al.  Video Manga: generating semantically meaningful video summaries , 1999, MULTIMEDIA '99.

[14]  A. Murat Tekalp,et al.  Automatic Soccer Video Analysis and Summarization , 2003, IS&T/SPIE Electronic Imaging.

[15]  Lawrence Wai-Choong Wong,et al.  ANSES: Summarisation of News Video , 2003, CIVR.

[16]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[17]  Bernard Mérialdo,et al.  Comparison of Multiepisode Video Summarization Algorithms , 2003, EURASIP J. Adv. Signal Process..

[18]  Barry Arons,et al.  SpeechSkimmer: a system for interactively skimming recorded speech , 1997, TCHI.

[19]  Nuno Vasconcelos,et al.  Bayesian modeling of video editing and structure: semantic features for video summarization and browsing , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[20]  Baoxin Li,et al.  A general framework for sports video summarization with its application to soccer , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[22]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[23]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[24]  C. Tomasi The Earth Mover's Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval , 1997 .

[25]  M. Ibrahim Sezan,et al.  Hierarchical video summarization , 1998, Electronic Imaging.

[26]  Howard D. Wactlar,et al.  Informedia - Search and Summarization in the Video Medium , 2000 .

[27]  Shih-Fu Chang,et al.  Structural and semantic analysis of video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[28]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[29]  Ba Tu Truong,et al.  New enhancements to cut, fade, and dissolve detection processes in video segmentation , 2000, ACM Multimedia.

[30]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[31]  Lalitha Agnihotri,et al.  Summarization of video programs based on closed captions , 2000, IS&T/SPIE Electronic Imaging.

[32]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[33]  Ajay Divakaran,et al.  Automatic extraction of soccer video highlights using a combination of motion and audio features , 2003, IS&T/SPIE Electronic Imaging.

[34]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[35]  Yukinobu Taniguchi,et al.  PanoramaExcerpts: extracting and packing panoramas for video browsing , 1997, MULTIMEDIA '97.

[36]  Reginald L. Lagendijk,et al.  Video abstraction based on asymmetric similarity values , 1999, Optics East.

[37]  Wolfgang Effelsberg,et al.  Abstracting Digital Movies Automatically , 1996, J. Vis. Commun. Image Represent..

[38]  Wolfgang Effelsberg,et al.  Robust clustering-based video-summarization with integration of domain-knowledge , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[39]  Kathleen R. McKeown,et al.  Summarization Evaluation Methods: Experiments and Analysis , 1998 .

[40]  Michael G. Christel,et al.  Evolving video skims into useful multimedia abstractions , 1998, CHI.

[41]  Jonathan Foote,et al.  Summarizing video using non-negative similarity matrix factorization , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[42]  Anthony Stefanidis,et al.  Summarizing video datasets in the spatiotemporal domain , 2000, Proceedings 11th International Workshop on Database and Expert Systems Applications.

[43]  A. Murat Tekalp,et al.  Two-stage hierarchical video summary extraction to match low-level user browsing preferences , 2003, IEEE Trans. Multim..

[44]  Dragutin Petkovic,et al.  Using audio time scale modification for video browsing , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[45]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[46]  John R. Smith,et al.  Video summarization and personalization for pervasive mobile devices , 2001, IS&T/SPIE Electronic Imaging.

[47]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[48]  Charles A. Bouman,et al.  ViBE: a compressed video database structured for active browsing and search , 2004, IEEE Transactions on Multimedia.

[49]  Mark J. F. Gales,et al.  Automatic transcription of Broadcast News , 2002, Speech Commun..