论文信息 - Hierarchical 3D kernel descriptors for action recognition using depth sequences

Hierarchical 3D kernel descriptors for action recognition using depth sequences

Action recognition is a challenging task due to intra-class motion variation caused by diverse style and duration in performed action videos. Previous works on action recognition task are more focused on hand-crafted features, treat different sources of information independently, and simply combine them before classification. In this paper we study action recognition from depth sequences captured by RGB-D cameras using kernel descriptors. Kernel descriptors provide an elegant way for combining a variety of information sources and can be easily applied to a hierarchical structure. We show how using kernel descriptors over pixel-level attributes in video sequences gains a great success compared to state-of-the-art methods. Following the success of kernel descriptors [1] on object recognition tasks, we employ 3D kernel descriptors, which are a unified framework for capturing pixel-level attributes and turning them into discriminative low-level features on individual 3D patches. We use efficient match kernel (EMK) [2] as the next level of our hierarchical structure to abstract the mid-level features for classification. Through extensive experiments we demonstrate using pixel-level attributes in the hierarchical architecture of our 3D kernel descriptor and EMK achieves superior performance on the standard depth sequences benchmarks.

[1] Silvio Savarese,et al. Learning context for collective activity recognition , 2011, CVPR 2011.

[2] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3] Cristian Sminchisescu,et al. Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[4] Nuno Vasconcelos,et al. Recognizing Activities by Attribute Dynamics , 2012, NIPS.

[5] Zicheng Liu,et al. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7] Ling Shao,et al. Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[8] Richard Bowden,et al. Hollywood 3D: Recognizing Actions in 3D Natural Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Ying Wu,et al. Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[10] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Xiaodong Yang,et al. Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[13] Z. Liu,et al. A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[14] Yunde Jia,et al. Interactive Phrases: Semantic Descriptionsfor Human Interaction Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] David Haussler,et al. Convolution kernels on discrete structures , 1999 .

[16] Wanqing Li,et al. Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[17] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[18] Dong Xu,et al. Recognizing RGB Images by Learning from RGB-D Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Meinard Müller,et al. Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[20] Jake K. Aggarwal,et al. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21] Xiaodong Yang,et al. Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Ramakant Nevatia,et al. Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[23] Zhengming Ding,et al. Latent Tensor Transfer Learning for RGB-D Action Recognition , 2014, ACM Multimedia.

[24] Yang Wang,et al. Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Dieter Fox,et al. Kernel Descriptors for Visual Recognition , 2010, NIPS.

[26] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[27] Mubarak Shah,et al. Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28] James M. Keller,et al. Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor , 2012, ACCV.

[29] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[30] Leonid Sigal,et al. Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Jake K. Aggarwal,et al. Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Silvio Savarese,et al. Recognizing human actions by attributes , 2011, CVPR 2011.

[33] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[34] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).