Context and locality constrained linear coding for human action recognition

Bag of Words (BOW) method with spatio-temporal local features has achieved great performance in human action recognition. However, most of the existing BOW approaches based on vector quantization (VQ) neglect the contextual information of each descriptor, and suffer serious quantization error. There are two main reasons for these: in the first, each local feature is only assigned to one label and second, the information about the spatial layout of the features is disregarded. In this paper, we present a novel and effective coding method called context and locality constrained linear coding (CLLC) to overcome these limitations, in which the relationships among local features and their structural information are preserved. After that, a group-wise sparse representation based classification (GSRC) method is implemented to assign the query sample into one category which yields the smallest reconstruction error. Our method is verified on the challenging databases and achieves comparable performance with state-of-the-art methods. We model a novel and effective human action recognition scheme.We propose context and locality constrained linear coding method to fully describe each local feature."Decisive feature" and "indecisive feature" are aided to collect the contextual information.Group-wise sparse representation based classification method is introduced to category the query samples.Our encoding problem has both analytical solution and approximation solution.

[1]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Yupin Luo,et al.  Making full use of spatial-temporal interest points: An AdaBoost approach for action recognition , 2010, 2010 IEEE International Conference on Image Processing.

[3]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[4]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[5]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[6]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[7]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Chunheng Wang,et al.  Action Recognition Using Context-Constrained Linear Coding , 2012, IEEE Signal Processing Letters.

[10]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Juan Carlos Niebles,et al.  Spatial-Temporal correlatons for unsupervised action classification , 2008, 2008 IEEE Workshop on Motion and video Computing.

[12]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[13]  Chun Chen,et al.  Graph Regularized Sparse Coding for Image Representation , 2011, IEEE Transactions on Image Processing.

[14]  Alberto Del Bimbo,et al.  Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos , 2012, IEEE Transactions on Multimedia.

[15]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[16]  Yupin Luo,et al.  Recognizing human actions using a new descriptor based on spatial-temporal interest points and weighted-output classifier , 2012, Neurocomputing.

[17]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Ling Shao,et al.  Combining appearance and structural features for human action recognition , 2013, Neurocomputing.

[19]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[20]  Xiaodong Yang,et al.  Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness , 2014, ECCV.

[21]  Peng Dai,et al.  Group Interaction Analysis in Dynamic Context$^{\ast}$ , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Changyin Sun,et al.  Supervised class-specific dictionary learning for sparse modeling in action recognition , 2012, Pattern Recognit..

[23]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Yu-Chiang Frank Wang,et al.  Locality-sensitive dictionary learning for sparse representation based classification , 2013, Pattern Recognit..

[27]  Qi Tian,et al.  Action Recognition Using Spatial-Temporal Context , 2010, 2010 20th International Conference on Pattern Recognition.

[28]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.