论文信息 - Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

We present a compositional model for video event detection. A video is modeled using a collection of both global and segment-level features and kernel functions are employed for similarity comparisons. The locations of salient, discriminative video segments are treated as a latent variable, allowing the model to explicitly ignore portions of the video that are unimportant for classification. A novel, multiple kernel learning (MKL) latent support vector machine (SVM) is defined, that is used to combine and re-weight multiple feature types in a principled fashion while simultaneously operating within the latent variable framework. The compositional nature of the proposed model allows it to respond directly to the challenges of temporal clutter and intra-class variation, which are prevalent in unconstrained internet videos. Experimental results on the TRECVID Multimedia Event Detection 2011 (MED11) dataset demonstrate the efficacy of the method.

[1] Thorsten Joachims,et al. Learning structural SVMs with latent variables , 2009, ICML '09.

[2] Andrew Zisserman,et al. Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Shuang Wu,et al. Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[5] Yunde Jia,et al. View-Invariant Action Recognition Using Latent Kernelized Structural SVM , 2012, ECCV.

[6] Tao Mei,et al. Multi-Layer Multi-Instance Learning for Video Concept Detection , 2008, IEEE Transactions on Multimedia.

[7] Florent Perronnin,et al. Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[10] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11] Gang Hua,et al. Scene Aligned Pooling for Complex Video Recognition , 2012, ECCV.

[12] Subhransu Maji,et al. Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13] Andrew Zisserman,et al. Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14] Yang Wang,et al. A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[15] Klaus-Robert Müller,et al. Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[16] Yang Wang,et al. Kernel Latent SVM for Visual Recognition , 2012, NIPS.

[17] S. V. N. Vishwanathan,et al. Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[18] Mubarak Shah,et al. Recognizing Complex Events Using Large Margin Joint Low-Level Event Model , 2012, ECCV.

[19] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.