Incorporating Audio Cues into Dialog and Action Scene Extraction

In this paper, we present an approach to extract scenes in video. The approach is top-down and uses video editing rules and audio cues to extract simple dialog and action scenes. The underlying model is a finite state machine coupled with audio cues that are determined using an audio classifier.

[1]  Arif Ghafoor,et al.  Object-oriented conceptual modeling of video data , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[4]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Dragutin Petkovic,et al.  Towards robust features for classifying audio in the CueVideo system , 1999, MULTIMEDIA '99.

[6]  Yoshinobu Tonomura,et al.  VideoMAP and VideoSpaceIcon: tools for anatomizing video content , 1993, INTERCHI.

[7]  Wolfgang Effelsberg,et al.  Scene Determination Based on Video and Audio Features , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[8]  Thomas S. Huang,et al.  Exploring video structure beyond the shots , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[9]  Kikukawa Takeshi,et al.  Development of an Automatic Summary Editing System for the Audio Visual Resources. , 1992 .

[10]  Alan Hanjalic,et al.  Automatically Segmenting Movies into Logical Story Units , 1999, VISUAL.

[11]  Lei Chen,et al.  Rule-based scene extraction from video , 2002, Proceedings. International Conference on Image Processing.

[12]  Wolfgang Effelsberg,et al.  Automatic audio content analysis , 1997, MULTIMEDIA '96.

[13]  Masahito Hirakawa,et al.  Content-based retrieval of video data by the grammar of film , 1997, Proceedings. 1997 IEEE Symposium on Visual Languages (Cat. No.97TB100180).

[14]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[16]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[17]  Behzad Shahraray,et al.  Scene change detection and content-based sampling of video sequences , 1995, Electronic Imaging.

[18]  Mingchun Liu,et al.  A study on content-based classification and retrieval of audio database , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[19]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[20]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[21]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[22]  Guojun Lu,et al.  A technique towards automatic audio classification and retrieval , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[23]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[24]  Stephen W. Smoliar,et al.  Video parsing, retrieval and browsing: an integrated and content-based solution , 1997, MULTIMEDIA '95.

[25]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[26]  Stan Z. Li,et al.  Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[27]  D. Arijon,et al.  Grammar of Film Language , 1976 .

[28]  Guojun Lu,et al.  An investigation of automatic audio classification and segmentation , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[29]  James L. Hein Theory of computation: an introduction , 1996 .

[30]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[31]  Joan E. Hart,et al.  Film Directing Shot by Shot: Visualizing from Concept to Screen , 1991 .