论文信息 - Video Event Classification Using Bag of Words and String Kernels

Video Event Classification Using Bag of Words and String Kernels

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However this approach does not model the temporal information of the video stream. In this paper we present a method to introduce temporal information within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW model. The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two datasets, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.

[1] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2] Chong-Wah Ngo,et al. Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[3] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[4] Noel E. O'Connor,et al. Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[5] Nello Cristianini,et al. Classification using String Kernels , 2000 .

[6] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7] Horst Bunke,et al. Edit distance-based kernel functions for structural pattern classification , 2006, Pattern Recognit..

[8] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .

[9] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[10] Jason Weston,et al. Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[11] Cordelia Schmid,et al. A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[12] Dong Xu,et al. Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14] Chong-Wah Ngo,et al. Video event detection using motion relativity and visual relatedness , 2008, ACM Multimedia.

[15] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16] Alberto Del Bimbo,et al. Action Categorization in Soccer Videos Using String Kernels , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[17] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[18] Cordelia Schmid,et al. A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[19] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.