3D Action Recognition Using Depth-Based Feature and Locality-Constrained Affine Subspace Coding

We propose a 3D action recognition algorithm which uses depth-based Gradient Local Auto-Correlations (GLAC) feature and Locality-constrained Affine Subspace Coding (LASC) to improve the discriminative ability of human actions in spatio-temporal subsequences of 3D depth videos. First, each entire depth video sequence is divided automatically into a set of subsequences (i.e., multi-scale sub-actions) by the normalized motion energy vector. Next Depth Motion Maps (DMMs) based GLAC features are employed to capture the shape information and motion cues of each sub-action. In order to obtain a more compact and discriminative representation, LASC is then proposed to encode the features extracted from the depth video. We show that the use of LASC exhibits better performance compared to existing methods such as Locality-constrained Linear Coding (LLC). On all three datasets we obtain competitive results compared to fifteen methods, while using fewer features and less complex models.

[1]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Cordelia Schmid,et al.  A Robust and Efficient Video Representation for Action Recognition , 2015, International Journal of Computer Vision.

[3]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[4]  Kien A. Hua,et al.  WTA Hash-Based Multimodal Feature Fusion for 3D Human Action Recognition , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[5]  Ajmal S. Mian,et al.  Learning a non-linear knowledge transfer model for cross-view action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ling Guan,et al.  Improving Action Recognition Using Collaborative Representation of Local Depth Map Feature , 2016, IEEE Signal Processing Letters.

[7]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[8]  Takumi Kobayashi,et al.  Image Feature Extraction Using Gradient Local Auto-Correlations , 2008, ECCV.

[9]  Limin Wang,et al.  Computer Vision and Image Understanding Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice , 2022 .

[10]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[12]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[14]  Guodong Guo,et al.  Fusing Spatiotemporal Features and Joints for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Jing Zhang,et al.  RGB-D-based action recognition datasets: A survey , 2016, Pattern Recognit..

[16]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[17]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[18]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Tian-Tsong Ng,et al.  Multimodal Multipart Learning for Action Recognition in Depth Videos , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ying Wu,et al.  Learning Maximum Margin Temporal Warping for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Ling Guan,et al.  Action recognition using multi-layer Depth Motion maps and Sparse Dictionary Learning , 2015, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).

[25]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[26]  Arif Mahmood,et al.  Histogram of Oriented Principal Components for Cross-View Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Qilong Wang,et al.  From dictionary of visual words to subspaces: Locality-constrained affine subspace coding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Arif Mahmood,et al.  Discriminative human action classification using locality-constrained linear coding , 2016, Pattern Recognit. Lett..

[29]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[30]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..