Knowledge adaptation for ad hoc multimedia event detection with few exemplars

Multimedia event detection (MED) has a significant impact on many applications. Though video concept annotation has received much research effort, video event detection remains largely unaddressed. Current research mainly focuses on sports and news event detection or abnormality detection in surveillance videos. Our research on this topic is capable of detecting more complicated and generic events. Moreover, the curse of reality, i.e., precisely labeled multimedia content is scarce, necessitates the study on how to attain respectable detection performance using only limited positive examples. Research addressing these two aforementioned issues is still in its infancy. In light of this, we explore Ad Hoc MED, which aims to detect complicated and generic events by using few positive examples. To the best of our knowledge, our work makes the first attempt on this topic. As the information from these few positive examples is limited, we propose to infer knowledge from other multimedia resources to facilitate event detection. Experiments are performed on real-world multimedia archives consisting of several challenging events. The results show that our approach outperforms several other detection algorithms. Most notably, our algorithm outperforms SVM by 43% and 14% comparatively in Average Precision when using Gaussian and Χ2 kernel respectively.

[1]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yueting Zhuang,et al.  Multi-Label Transfer Learning With Sparse Representation , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Nicu Sebe,et al.  Exploiting the entire feature space with sparsity for automatic image annotation , 2011, ACM Multimedia.

[4]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[5]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[6]  Yueting Zhuang,et al.  Active post-refined multimodality video semantic concept detection with tensor representation , 2008, ACM Multimedia.

[7]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[8]  Chong-Wah Ngo,et al.  Video event detection using motion relativity and visual relatedness , 2008, ACM Multimedia.

[9]  Daniel P. W. Ellis,et al.  IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System , 2011, TRECVID.

[10]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[11]  Vincent S. Tseng,et al.  Integrated Mining of Visual Features, Speech Features, and Frequent Patterns for Semantic Video Annotation , 2008, IEEE Transactions on Multimedia.

[12]  Noel E. O'Connor,et al.  Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[14]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[15]  Nuno Vasconcelos,et al.  TaylorBoost: First and second-order boosting algorithms with explicit margin control , 2011, CVPR 2011.

[16]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[17]  Ming-Syan Chen,et al.  Association and Temporal Rule Mining for Post-Filtering of Semantic Concept Detection in Video , 2008, IEEE Transactions on Multimedia.

[18]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[21]  Changsheng Xu,et al.  Live sports event detection based on broadcast video and web-casting text , 2006, MM '06.

[22]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Gang Wang,et al.  Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video , 2008, ACM Multimedia.

[24]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[25]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Chong-Wah Ngo,et al.  Semantic context transfer across heterogeneous sources for domain adaptive video search , 2009, ACM Multimedia.

[27]  Fei-Fei Li,et al.  What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization , 2010, Computer Vision: Detection, Recognition and Reconstruction.

[28]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Min Chen,et al.  Video Semantic Event/Concept Detection Using a Subspace-Based Multimedia Data Mining Framework , 2008, IEEE Transactions on Multimedia.

[30]  Wei Liu,et al.  Double Fusion for Multimedia Event Detection , 2012, MMM.