Automatic indexing of video content via the detection of semantic events

The number, and size, of digital video databases is continuously growing. Unfortunately, most, if not all, of the video content in these databases is stored without any sort of indexing or analysis and without any associated metadata. If any of the videos do have metadata, then it is usually the result of some manual annotation process rather than any automatic indexing. Thus, locating clips and browsing content is difficult, time consuming and generally inefficient. The task of automatically indexing movies is particularly difficult given their innovative creation process and the individual style of many film makers. However, there are a number of underlying film grammar conventions that are universally followed, from a Hollywood blockbuster to an underground movie with a limited budget. These conventions dictate many elements of film making such as camera placement and editing. By examining the use of these conventions it is possible to extract information about the events in a movie. This research aims to provide an approach that creates an indexed version of a movie to facilitate ease of browsing and efficient retrieval. In order to achieve this aim, all of the relevant events contained within a movie are detected and classified into a predefined index. The event detection process involves examining the underlying structure of a movie and utilising audiovisual analysis techniques, supported by machine learning algorithms, to extract information based on this structure. The result is an indexed movie that can be presented to users for browsing/retrieval of relevant events, as well as supporting user specified searching. Extensive evaluation of the indexing approach is carried out. This evaluation indicates efficient performance of the event detection and retrieval system, and also highlights the subjective nature of video content.

[1]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[2]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[3]  John P. Oakley,et al.  Storage and Retrieval for Image and Video Databases , 1993 .

[4]  Mubarak Shah,et al.  A Framework for Semantic Classification of Scenes Using Finite State Machines , 2004, CIVR.

[5]  Tiecheng Liu,et al.  A hidden Markov model approach to the structure of documentaries , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[6]  Alan F. Smeaton,et al.  Associating low-level features with semantic concepts using video objects and relevance feedback , 2005 .

[7]  Boon-Lock Yeo,et al.  Analysis And Presentation Of Soccer Highlights From Digital Video , 1995 .

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  David G. Stork,et al.  Pattern Classification , 1973 .

[10]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[11]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[12]  Ali N. Akansu,et al.  Multi-Modal Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing , 2001, Multimedia Tools and Applications.

[13]  Baoxin Li,et al.  Event detection and summarization in American football broadcast video , 2001, IS&T/SPIE Electronic Imaging.

[14]  Noel E. O'Connor,et al.  Evaluating and combining digital video shot boundary detection algorithms , 2000 .

[15]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[16]  David Bordwell,et al.  Film Art: An Introduction , 1979 .

[17]  Sebastiano Impedovo,et al.  Image basic features indexing techniques for video skimming , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[18]  Ba Tu Truong,et al.  Scene extraction in motion pictures , 2003, IEEE Trans. Circuits Syst. Video Technol..

[19]  Zhu Liu,et al.  Integration of audio and visual information for content-based video segmentation , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[20]  M. Smith,et al.  Video Skimming for Quick Browsing based on Audio and Image Characterization , 1995 .

[21]  Keiichiro Hoashi,et al.  Shot Boundary Determination on MPEC Compressed Domain and Story Segmentation Experiments for TRECVID 2003 , 2003, TRECVID.

[22]  Noel E. O'Connor,et al.  Speech-music discrimination from MPEG-1 bitstream , 2001 .

[23]  Patrick Pérez,et al.  Rapid Summarisation and Browsing of Video Sequences , 2002, BMVC.

[24]  Alan F. Smeaton,et al.  A generic news story segmentation system and its evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Mohammed Ghanbari,et al.  Standard Codecs: Image Compression to Advanced Video Coding , 2003 .

[26]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[27]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[28]  Yu Cao,et al.  Audio-Assisted Scene Segmentation for Story Browsing , 2003, CIVR.

[29]  Jeho Nam,et al.  Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[30]  Paul Over,et al.  The TREC VIdeo Retrieval Evaluation (TRECVID): A Case Study and Status Report , 2004, RIAO.

[31]  Wolfgang Effelsberg,et al.  Scene Determination Based on Video and Audio Features , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[32]  Patrick Gros,et al.  Temporal structure analysis of broadcast tennis video using hidden Markov models , 2003, IS&T/SPIE Electronic Imaging.

[33]  Mei Han,et al.  Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.

[34]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[35]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[36]  Wallapak Tavanapong,et al.  ShotWeave: A Shot Clustering Technique for Story Browsing for Large Video Databases , 2002, EDBT Workshops.

[37]  Surya Nepal,et al.  Automatic detection of 'Goal' segments in basketball videos , 2001, MULTIMEDIA '01.

[38]  Mike Graham,et al.  Extracting information about emotions in films , 2003, ACM Multimedia.

[39]  Hang-Bong Kang Emotional event detection using relevance feedback , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[40]  Khurshid Ahmad,et al.  What happens in films? , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[41]  Milan Petkovic,et al.  Multi-modal extraction of highlights from TV Formula 1 programs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[42]  Stanley Boykin,et al.  Machine learning of event segmentation for news on demand , 2000, CACM.

[43]  Alberto Del Bimbo,et al.  Soccer highlights detection and recognition using HMMs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[44]  C.-C. Jay Kuo,et al.  Video Content Analysis Using Multimodal Information , 2003, Springer US.

[45]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[46]  Michael G. Christel,et al.  Evolving video skims into useful multimedia abstractions , 1998, CHI.

[47]  Daniel P. W. Ellis,et al.  Speech/music discrimination based on posterior probability features , 1999, EUROSPEECH.

[48]  Benoit Mory,et al.  Video motion representation for improved content access , 2000, 2000 Digest of Technical Papers. International Conference on Consumer Electronics. Nineteenth in the Series (Cat. No.00CH37102).

[49]  Hideo Hashimoto,et al.  Video indexing using motion vectors , 1992, Other Conferences.

[50]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[51]  Svetha Venkatesh,et al.  On the automatic indexing of cricket using camera motion parameters , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[52]  B. Li,et al.  Event detection and summarization in sports video , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[53]  Ken Dancyger,et al.  The Technique of Film and Video Editing: History, Theory, and Practice , 1993 .

[54]  Jonathan Foote,et al.  Discriminative techniques for keyframe selection , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[55]  Christopher J. Bowen Grammar of the Edit , 1993 .

[56]  A. Murat Tekalp,et al.  Object segmentation and labeling by learning from examples , 2003, IEEE Trans. Image Process..

[57]  Guojun Lu,et al.  A technique towards automatic audio classification and retrieval , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[58]  Ajay Divakaran,et al.  Automatic extraction of soccer video highlights using a combination of motion and audio features , 2003, IS&T/SPIE Electronic Imaging.

[59]  Sanjeev R. Kulkarni,et al.  Automated analysis and annotation of basketball video , 1997, Electronic Imaging.

[60]  Noel E. O'Connor,et al.  Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[61]  Nobuyuki Yagi,et al.  Estimation of camera parameters from image sequence for model-based video coding , 1994, IEEE Trans. Circuits Syst. Video Technol..