Semantic Event Detection using Conditional Random Fields

Semantic event detection is an active research field of video mining in recent years. One of the challenging problems is how to effectively model temporal and multi-modality characteristics of video. In this paper, we employ Conditional Random Fields (CRFs) to fuse temporal multi-modality cues for event detection. CRFs are undirected probabilistic models designed for segmenting and labeling sequence data. Compared with traditional SVM and Hidden Markov Models (HMMs), CRFs based event detection offers several particular advantages including the abilities to relax strong independence assumptions in the state transition and avoid a fundamental limitation of directed graphical models. To detect event, we use a three-level framework based on multi-modality fusion and mid-level keywords. The first level extracts audiovisual features, the mid-level detects semantic keywords, and the high-level infers semantic events from multiple keyword sequences. The experimental results from soccer highlights detection demonstrate that CRFs achieves better performance particularly in slice level measure.

[1]  M. Ibrahim Sezan,et al.  A semantic event-detection approach and its application to detecting hunts in wildlife vide , 2000, IEEE Trans. Circuits Syst. Video Technol..

[2]  Chng Eng Siong,et al.  Automatic replay generation for soccer video broadcasting , 2004, MULTIMEDIA '04.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Samy Bengio,et al.  Semi-supervised adapted HMMs for unusual event detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[6]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.

[7]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[8]  Qi Tian,et al.  A mid-level representation framework for semantic sports video analysis , 2003, ACM Multimedia.

[9]  Mohan S. Kankanhalli,et al.  Creating audio keywords for event detection in soccer video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[10]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[12]  Qi Tian,et al.  A repeated video clip identification system , 2005, MULTIMEDIA '05.

[13]  Svetha Venkatesh,et al.  Topic transition detection using hierarchical hidden Markov and semi-Markov models , 2005, MULTIMEDIA '05.

[14]  M. Luo,et al.  Pyramidwise structuring for soccer highlight extraction , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.