Dialogue scene detection in movies using low and mid-level visual features

This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shot-level temporal characteristics of the scene under investigation. The choice of visual features used is motivated by a consideration of formal film syntax. The system is designed so that the analysis may be applied in order to detect different types of scenes, although in this paper we focus on dialogue sequences as these are the most prevalent scenes in the movies considered to date.

[1]  S. Marlow,et al.  A combined audio-visual contribution to event detection in field sports broadcast video. Case study: Gaelic football , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[2]  C.-C. Jay Kuo,et al.  Video Content Analysis Using Multimodal Information , 2003, Springer US.

[3]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[4]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  B. S. Manjunath,et al.  A Motion Activity Descriptor and Its Extraction in Compressed Domain , 2001, IEEE Pacific Rim Conference on Multimedia.

[6]  Alan F. Smeaton,et al.  A generic news story segmentation system and its evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Yihong Gong,et al.  Automatic parsing and indexing of news video , 1995, Multimedia Systems.

[8]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[9]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[10]  Noel E. O'Connor,et al.  Evaluating and combining digital video shot boundary detection algorithms , 2000 .

[11]  David Bordwell,et al.  Film Art: An Introduction , 1979 .

[12]  Zhu Liu,et al.  Integration of audio and visual information for content-based video segmentation , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[13]  Shih-Fu Chang,et al.  Condensing computable scenes using visual complexity and film syntax analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..