Modality Mixture Projections for Semantic Video Event Detection

Event detection is one of the most fundamental components for various kinds of domain applications of video information system. In recent years, it has gained a considerable interest of practitioners and academics from different areas. While detecting video event has been the subject of extensive research efforts recently, much less existing approach has considered multimodal information and related efficiency issues. In this paper, we use a subspace selection technique to achieve fast and accurate video event detection using a subspace selection technique. The approach is capable of discriminating different classes and preserving the intramodal geometry of samples within an identical class. With the method, feature vectors presenting different kind of multi data can be easily projected from different identities and modalities onto a unified subspace, on which recognition process can be performed. Furthermore, the training stage is carried out once and we have a unified transformation matrix to project different modalities. Unlike existing multimodal detection systems, the new system works well when some modalities are not available. Experimental results based on soccer video and TRECVID news video collections demonstrate the effectiveness, efficiency and robustness of the proposed MMP for individual recognition tasks in comparison to the existing approaches.

[1]  Yihong Gong,et al.  Automatic parsing of news video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[2]  Alan F. Smeaton,et al.  Evaluation of automatic shot boundary detection on a large video test suite , 1999 .

[3]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[4]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[5]  Baoxin Li,et al.  Event detection and summarization in American football broadcast video , 2001, IS&T/SPIE Electronic Imaging.

[6]  Chin-Hui Lee,et al.  The segmentation of news video into story units , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[7]  Mei Han,et al.  Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.

[8]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[9]  Milan Petkovic,et al.  Multi-modal extraction of highlights from TV Formula 1 programs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[10]  Svetha Venkatesh,et al.  On the automatic indexing of cricket using camera motion parameters , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[11]  Shih-Fu Chang,et al.  Unsupervised Mining of Statistical Temporal Structures in Video , 2003 .

[12]  Ajay Divakaran,et al.  Automatic extraction of soccer video highlights using a combination of motion and audio features , 2003, IS&T/SPIE Electronic Imaging.

[13]  Riccardo Leonardi,et al.  Semantic indexing of soccer audio-visual sequences: a multimodal approach based on controlled Markov chains , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Noel E. O'Connor,et al.  Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Yihong Gong,et al.  Automatic parsing and indexing of news video , 1995, Multimedia Systems.

[17]  Stephen Lin,et al.  Rank-one Projections with Adaptive Margins for Face Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Stephen Lin,et al.  Element Rearrangement for Tensor-Based Subspace Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Dong Xu,et al.  Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Min Chen,et al.  Video Semantic Event/Concept Detection Using a Subspace-Based Multimedia Data Mining Framework , 2008, IEEE Transactions on Multimedia.

[21]  Xuelong Li,et al.  Multimodal biometrics using geometry preserving projections , 2008, Pattern Recognit..