A hierarchical framework for movie content analysis: Let computers watch films like humans

In this paper, we specially propose a hierarchical framework for movie content analysis. The purpose of our work is trying to realize computerspsila understanding for movie content, especially ldquowho, what, where, howrdquo which occur in the storyline by imitating human perception and cognition. The framework consists of two hierarchies. As for the low level part, we originally construct the human attention model with temporal information motivated by the Weber-Fechner Law to depict the variation of human perception in multiple modalities. As for the high level part, we focus on semantic understanding of different granularities of videos and simulate human cognition for movie content. Based on this hierarchical framework, we present its applications on semantic retrieval, video summarization and content filter. The promising results of userspsila subjective assessment indicate that the proposed framework is applicable for automatic analysis of movie content by computers.

[1]  Svetha Venkatesh,et al.  Novel approach to determining tempo and dramatic story sections in motion pictures , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[2]  Svetha Venkatesh,et al.  Horror film genre typing and scene labeling via audio analysis , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[4]  Guizhong Liu,et al.  A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[6]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[7]  Loong Fah Cheong,et al.  Framework for Synthesizing Semantic-Level Indices , 2003, Multimedia Tools and Applications.

[8]  Sheng Tang,et al.  An Innovative Model of Tempo and Its Application in Action Scene Detection for Movie Analysis , 2008, 2008 IEEE Workshop on Applications of Computer Vision.

[9]  Noel E. O'Connor,et al.  Associating characters with events in films , 2007, CIVR '07.

[10]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[11]  Svetha Venkatesh,et al.  Role of shot length in characterizing tempo and dramatic story sections in motion pictures , 2000 .

[12]  Bai Liang,et al.  Feature analysis and extraction for audio automatic classification , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[13]  Sheng Tang,et al.  Human Attention Model for Action Movie Analysis , 2007, 2007 2nd International Conference on Pervasive Computing and Applications.

[14]  Lei Chen,et al.  Incorporating Audio Cues into Dialog and Action Scene Extraction , 2003, IS&T/SPIE Electronic Imaging.

[15]  S. Hecht,et al.  THE VISUAL DISCRIMINATION OF INTENSITY AND THE WEBER-FECHNER LAW , 1924, The Journal of general physiology.

[16]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[17]  Loong Fah Cheong,et al.  Affective understanding in film , 2006, IEEE Trans. Circuits Syst. Video Technol..

[18]  Chun Chen,et al.  Subspace analysis and optimization for AAM based face alignment , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[19]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[20]  H. Wactlar,et al.  The Challenges of Continuous Capture , Contemporaneous Analysis , and Customized Summarization of Video Content , 2001 .

[21]  Kwang-Ting Cheng,et al.  An adaptive skin model and its application to objectionable image filtering , 2004, MULTIMEDIA '04.

[22]  Svetha Venkatesh,et al.  Toward automatic extraction of expressive elements from motion pictures: tempo , 2002, IEEE Trans. Multim..

[23]  Yingxu Wang,et al.  On Cognitive Informatics , 2002, Proceedings First IEEE International Conference on Cognitive Informatics.

[24]  Svetha Venkatesh,et al.  Detecting indexical signs in film audio for scene interpretation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[25]  Svetha Venkatesh,et al.  Study of shot length and motion as contributing factors to movie tempo (poster session) , 2000, ACM Multimedia.