Video Event Detection Using Temporal Pyramids of Visual Semantics with Kernel Optimization and Model Subspace Boosting

In this study, we present a system for video event classification that generates a temporal pyramid of static visual semantics using minimum-value, maximum-value, and average-value aggregation techniques. Kernel optimization and model subspace boosting are then applied to customize the pyramid for each event. SVM models are independently trained for each level in the pyramid using kernel selection according to 3-fold cross-validation. Kernels that both enforce static temporal order and permit temporal alignment are evaluated. Model subspace boosting is used to select the best combination of pyramid levels and aggregation techniques for each event. The NIST TRECVID Multimedia Event Detection (MED) 2011 dataset was used for evaluation. Results demonstrate that kernel optimizations using both temporally static and dynamic kernels together achieves better performance than any one particular method alone. In addition, model sub-space boosting reduces the size of the model by 80%, while maintaining 96% of the performance gain.

[1]  Alberto Del Bimbo,et al.  Video event classification using string kernels , 2010, Multimedia Tools and Applications.

[2]  Daniel P. W. Ellis,et al.  IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System , 2011, TRECVID.

[3]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[4]  Chong-Wah Ngo,et al.  Video event detection using motion relativity and visual relatedness , 2008, ACM Multimedia.

[5]  James M. Rehg,et al.  Temporal causality for the analysis of visual events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Rong Yan,et al.  Model-shared subspace boosting for multi-label classification , 2007, KDD '07.

[7]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Rong Yan,et al.  Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce , 2009, LS-MMRM '09.

[9]  Werner Bailer,et al.  A Feature Sequence Kernel for Video Concept Classification , 2011, MMM.

[10]  Apostol Natsev,et al.  Web-based information content and its application to concept-based video retrieval , 2008, CIVR '08.

[11]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[12]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[13]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.