Semantic classification of movie scenes using finite state machines

The problem of classifying scenes from feature films into semantic categories is addressed and a robust framework for this problem is proposed. It is proposed that the finite state machines (FSM) are suitable for detecting and classifying scenes and their usage is demonstrated for three types of movie scenes: conversation, suspense and action. This framework utilises the structural information of the scenes together with the low-level and mid-level features. Low level features of the video including motion and audio energy and a mid-level feature, body, are used in this approach. The transitions of the FSMs are determined by the features from each shot in the scene. The FSMs have been experimented on over 80 clips and convincing results have been achieved.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Zhu Liu,et al.  Classification TV programs based on audio information using hidden Markov model , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[3]  Shrikanth Narayanan,et al.  Movie Content Analysis, Indexing and Skimming Via Multimodal Information , 2003 .

[4]  Wolfgang Effelsberg,et al.  Scene Determination Based on Video and Audio Features , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[5]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Svetha Venkatesh,et al.  Novel approach to determining tempo and dramatic story sections in motion pictures , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[7]  Masahito Hirakawa,et al.  Content-based retrieval of video data by the grammar of film , 1997, Proceedings. 1997 IEEE Symposium on Visual Languages (Cat. No.97TB100180).

[8]  Omar Javed,et al.  University of Central Florida at TRECVID 2004 , 2003, TRECVID.

[9]  Shih-Fu Chang,et al.  Video scene segmentation using video and audio features , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[10]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..