Multimedia event-based video indexing using time intervals

We propose the time interval multimedia event (TIME) framework as a robust approach for classification of semantic events in multimodal video documents. The representation used in TIME extends the Allen temporal interval relations and allows for proper inclusion of context and synchronization of the heterogeneous information sources involved in multimodal video analysis. To demonstrate the viability of our approach, it was evaluated on the domains of soccer and news broadcasts. For automatic classification of semantic events, we compare three different machine learning techniques, i.c. C4.5 decision tree, maximum entropy, and support vector machine. The results show that semantic video indexing results significantly benefit from using the TIME framework.

[1]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[2]  Marcel Worring,et al.  Interactive Adaptive Movie Annotation , 2003, IEEE Multim..

[3]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[4]  Marcel Worring,et al.  Interactive Logical Story Unit Segmenta- tion , 2002 .

[5]  Stefan Eickeler,et al.  Content-based video indexing of TV broadcast news using hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  Dennis Koelma,et al.  User transparency: a fully sequential programming model for efficient data parallel image processing , 2004, Concurr. Pract. Exp..

[7]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[8]  Arnold W. M. Smeulders,et al.  Combining strings and necklaces for interactive three-dimensional segmentation of spinal images using an Integral deformable spine model , 2004, IEEE Transactions on Biomedical Engineering.

[9]  Boon-Lock Yeo,et al.  Analysis And Presentation Of Soccer Highlights From Digital Video , 1995 .

[10]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Riccardo Leonardi,et al.  Semantic Indexing of Multimedia Documents , 2002, IEEE Multim..

[12]  C.-C. Jay Kuo,et al.  On-line knowledge- and rule-based video classification system for video indexing and dissemination , 2002, Inf. Syst..

[13]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[14]  Marco Aiello,et al.  Thick 2D relations for document understanding , 2004, Inf. Sci..

[15]  Dennis Koelma,et al.  User transparency: a fully sequential programming model for efficient data parallel image processing: Research Articles , 2004 .

[16]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[17]  A. Smeulders,et al.  Appearance Kalman tracking under severe occlusions , 2002 .

[18]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[19]  Marcel Worring,et al.  The UvA color document dataset , 2004, International Journal of Document Analysis and Recognition (IJDAR).

[20]  Alberto Del Bimbo,et al.  Soccer highlights detection and recognition using HMMs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[21]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[22]  A. Del Bimbo,et al.  Indexing for reuse of TV news shots , 2002, Pattern Recognit..

[23]  Ichiro Ide,et al.  Automatic Video Indexing Based on Shot Classification , 1998, AMCP.

[24]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  Marco Aiello,et al.  Document understanding for a broad class of documents , 2002, Int. J. Document Anal. Recognit..

[27]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[28]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[29]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[30]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[31]  Djoerd Hiemstra,et al.  Lazy Users and Automatic Video Retrieval Tools in (the) Lowlands , 2001, TREC.

[32]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[33]  Noboru Babaguchi,et al.  Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..

[34]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Mei Han,et al.  An integrated baseball digest system using maximum entropy method , 2002, MULTIMEDIA '02.

[36]  Marcel Worring,et al.  Time interval maximum entropy based event indexing in soccer video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[37]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[38]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..