Indexing of Fictional Video Content for Event Detection and Summarisation

This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach.

[1]  Noel E. O'Connor,et al.  Evaluating and combining digital video shot boundary detection algorithms , 2000 .

[2]  David Bordwell,et al.  Film Art: An Introduction , 1979 .

[3]  Mubarak Shah,et al.  A Framework for Semantic Classification of Scenes Using Finite State Machines , 2004, CIVR.

[4]  Mubarak Shah,et al.  Semantic classification of movie scenes using finite state machines , 2005 .

[5]  Wolfgang Effelsberg,et al.  Scene Determination Based on Video and Audio Features , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[6]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Cherié L. Weible,et al.  The Internet Movie Database , 2001 .

[8]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[9]  Lei Chen,et al.  Incorporating Audio Cues into Dialog and Action Scene Extraction , 2003, IS&T/SPIE Electronic Imaging.

[10]  Ali N. Akansu,et al.  Multi-Modal Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing , 2001, Multimedia Tools and Applications.

[11]  Kuo C. Jay Video Content Analysis Using Multimodal Information: For Movie Content Extraction, Indexing and Representation , 2003 .

[12]  Noel E. O'Connor,et al.  Dialogue Sequence Detection in Movies , 2005, CIVR.

[13]  Alan F. Smeaton,et al.  A System for Event-Based Film Browsing , 2006, TIDSE.

[14]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[15]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[16]  C.-C. Jay Kuo,et al.  Video Content Analysis Using Multimodal Information , 2003, Springer US.

[17]  Ying Li,et al.  Movie Event Detection by Using Audio Visual Information , 2001, IEEE Pacific Rim Conference on Multimedia.

[18]  Yu Cao,et al.  Audio-Assisted Scene Segmentation for Story Browsing , 2003, CIVR.

[19]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[20]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[21]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[22]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..