A Feature Sequence Kernel for Video Concept Classification

Kernel methods such as Support Vector Machines are widely applied to classification problems, including concept detection in video. Nonetheless issues like modeling specific distance functions of feature descriptors or the temporal sequence of features in the kernel have received comparatively little attention in multimedia research. We review work on kernels for commonly used MPEG-7 visual features and propose a kernel for matching temporal sequences of these features. The sequence kernel is based on ideas from string matching, but does not require discretization of the input feature vectors and deals with partial matches and gaps. Evaluation on the TRECVID 2007 high-level feature extraction data set shows that the sequence kernel clearly outperforms the radial basis function (RBF) kernel and the MPEG-7 visual feature kernels using only single key frames.

[1]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[2]  Wessel Kraaij,et al.  TRECVID-2009 high-level feature task: Overview (slides0 , 2005 .

[3]  P. Beek,et al.  Text of 15938-5 FCD Information Technology-Multimedia Content Description Interface-Pard 5 Multimedia Description Schemes , 2001 .

[4]  Mei-Chen Yeh,et al.  A string matching approach for visual retrieval and classification , 2008, MIR '08.

[5]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[6]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[7]  Stéphane Ayache,et al.  TRECVID 2007: Collaborative Annotation using Active Learning , 2007, TRECVID.

[8]  Won Jong Jeon,et al.  Spatio-temporal pyramid matching for sports videos , 2008, MIR '08.

[9]  Yiannis Kompatsiaris,et al.  K-Space at TRECvid 2006 , 2006, TRECVID.

[10]  Edward Y. Chang,et al.  Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance , 2003, MULTIMEDIA '03.

[11]  Meng Wang,et al.  Correlative multilabel video annotation with temporal kernels , 2008, TOMCCAP.

[12]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ebroul Izquierdo,et al.  Relevance feedback for image retrieval in structured multi-feature spaces , 2006, MobiMedia '06.

[14]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[15]  Ebroul Izquierdo,et al.  Kernels in structured multi-feature spaces for image retrieval , 2006 .

[16]  Nicu Sebe,et al.  Special section from the ACM multimedia conference 2007 , 2008, TOMCCAP.

[17]  Werner Bailer,et al.  A distance measure for repeated takes of one scene , 2008, The Visual Computer.

[18]  Irene Kotsia,et al.  Relative Margin Support Tensor Machines for gait and action recognition , 2010, CIVR '10.

[19]  Alberto Del Bimbo,et al.  Video event classification using string kernels , 2010, Multimedia Tools and Applications.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.