Event-Driven Video Abstraction and Visualization

In this paper, we propose a new video summarization procedure that produces a dynamic (video) abstract of the original video sequence. Our technique compactly summarizes a video data by preserving its original temporal characteristics (visual activity) and semantically essential information. It relies on an adaptive nonlinear sampling. The local sampling rate is directly proportional to the amount of visual activity in localized sub-shot units of the video. To get very short, yet semantically meaningful summaries, we also present an event-oriented abstraction scheme, in which two semantic events; emotional dialogue and violent action, are characterized and abstracted into the video summary before all other events. If the length of the summary permits, other non key events are then added. The resulting video abstract is highly compact.

[1]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[2]  Gene H. Golub,et al.  Matrix computations , 1983 .

[3]  Jeho Nam,et al.  Progressive resolution motion indexing of video object , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  B. Kawin,et al.  How Movies Work , 1987 .

[6]  Wolfgang Effelsberg,et al.  Automatic audio content analysis , 1997, MULTIMEDIA '96.

[7]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[8]  Jeho Nam,et al.  Combined audio and visual streams analysis for video sequence segmentation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  David Bordwell,et al.  Film Art: An Introduction , 1979 .

[11]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[12]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[14]  Jeho Nam,et al.  Speaker identification and video analysis for hierarchical video shot classification , 1997, Proceedings of International Conference on Image Processing.

[15]  Jeho Nam,et al.  Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[16]  Michael Mills,et al.  A magnifier tool for video data , 1992, CHI.

[17]  Dragutin Petkovic,et al.  Key to effective video retrieval: effective cataloging and browsing , 1998, MULTIMEDIA '98.

[18]  Jing Xiao,et al.  Content-Based Video Indexing and Retrieval , 2004 .

[19]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[20]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[21]  Yukinobu Taniguchi,et al.  Structured Video Computing , 1994, IEEE MultiMedia.

[22]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Dennis M. Healy,et al.  Wavelet transform domain filters: a spatially selective noise filtration technique , 1994, IEEE Trans. Image Process..

[24]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[25]  William Lord,et al.  Speech Pitch Frequency as an Emotional State Indicator , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Wolfgang Effelsberg,et al.  Video abstracting , 1997, CACM.

[27]  Gary Marchionini,et al.  Key frame preview techniques for video browsing , 1998, DL '98.

[28]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[29]  Deepen Sinha,et al.  Low bit rate transparent audio compression using adapted wavelets , 1993, IEEE Trans. Signal Process..

[30]  Rosalind W. Picard Affective Computing , 1997 .

[31]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..