Action recognition based on spatial-temporal pyramid sparse coding

This paper introduces a novel video presentation term spatial-temporal pyramid sparse coding (STPSC) which characterizes both the spatial and temporal aspects of the video. Specifically, the co-occurrences of visual words are computed with respect to the spatial layout and the sequencing of the features in the video. The representation captures both the spatial arrangement and the temporal relationship of the words. Our representation is motivated by the technology spatial pyramid matching (SPM) which is used to recognize scenes in the image. We extend SPM to video analysis combining with sparse coding. Firstly, dense feature points are extracted and represented by displacement information from a dense optical flow field. Then sparse coding is used to quantize the feature descriptors, and the spatial-temporal pyramid is introduced to represent an action. Finally, we use SVM to classify the videos. Experimental results showed improvements over the state-of-the-art techniques on the public action dataset.

[1]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[2]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Won Jong Jeon,et al.  Spatio-temporal pyramid matching for sports videos , 2008, MIR '08.

[5]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[8]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[9]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.