A Non-negative Low Rank and Sparse Model for Action Recognition

In this paper, we present a new method for video action recognition. The main contributions are two-fold. First, we propose local coordinates contained descriptors (LCCD) instead of appearance-only descriptors. We encode global geometric correspondence by combining descriptors with spatio-temporal locations, which is different from previous methods such as spatio-temporal pyramid matching (STPM). Spatio-temporal location is taken as part of the coding step by utilizing LCCD. Second, a novel non-negative low rank and sparse coding model is developed to encode descriptors for action recognition. Motivated by low rank matrix recovery and completion, local descriptors in a spatio-temporal neighborhood are similar and should be approximately low rank. The objective function is obtained by seeking non-negative low rank and sparse coefficients for local descriptors. The learned coefficients can capture location information and the structure of descriptors, hence improve the discriminability of representations. Experiments validate that our method achieves the state-of-the-art results on two benchmark datasets.

[1]  Yasuyuki Matsushita,et al.  Camera calibration with lens distortion from low-rank textures , 2011, CVPR 2011.

[2]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  William Brendel,et al.  Activities as Time Series of Human Postures , 2010, ECCV.

[4]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[5]  Nenghai Yu,et al.  Non-negative low rank and sparse graph for semi-supervised learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Narendra Ahuja,et al.  Low-Rank Sparse Learning for Robust Visual Tracking , 2012, ECCV.

[8]  Haibin Ling,et al.  3D R Transform on Spatio-temporal Interest Points for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Larry S. Davis,et al.  Learning Structured Low-Rank Representations for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Won Jong Jeon,et al.  Spatio-temporal pyramid matching for sports videos , 2008, MIR '08.

[11]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Tsuhan Chen,et al.  Spatio-Temporal Phrases for Activity Recognition , 2012, ECCV.

[13]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[14]  Changsheng Xu,et al.  Low-Rank Sparse Coding for Image Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[17]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.