论文信息 - Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations

Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations

Human action recognition from videos is a challenging machine vision task with multiple important application domains, such as human-robot/machine interaction, interactive entertainment, multimedia information retrieval, and surveillance. In this paper, we present a novel approach to human action recognition from 3D skeleton sequences extracted from depth data. We use the covariance matrix for skeleton joint locations over time as a discriminative descriptor for a sequence. To encode the relationship between joint movement and time, we deploy multiple covariance matrices over sub-sequences in a hierarchical fashion. The descriptor has a fixed length that is independent from the length of the described sequence. Our experiments show that using the covariance descriptor with an off-the-shelf classification algorithm outperforms the state of the art in action recognition on multiple datasets, captured either via a Kinect-type sensor or a sophisticated motion capture system. We also include an evaluation on a novel large dataset using our own annotation.

[1] Marilyn M. Mantei,et al. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , 1986, CHI 1986.

[2] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3] James W. Davis. Hierarchical motion history images for recognizing human motion , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[4] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6] Fatih Murat Porikli,et al. Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[7] Tido Röder,et al. Documentation Mocap Database HDM05 , 2007 .

[8] Fatih Murat Porikli,et al. Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Larry S. Davis,et al. Kernel integral images: A framework for fast non-uniform filtering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Wanqing Li,et al. Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[11] Wei Liang,et al. Discriminative human action recognition in the learned hierarchical manifold space , 2010, Image Vis. Comput..

[12] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[13] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[14] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[15] Geoffrey E. Hinton,et al. Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.

[16] Toby Sharp,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[17] Luc Van Gool,et al. Does Human Action Recognition Benefit from Pose Estimation? , 2011, BMVC.

[18] Ruzena Bajcsy,et al. Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, CVPR Workshops.

[19] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Sebastian Nowozin,et al. Action Points: A Representation for Low-latency Online Human Action Recognition , 2012 .

[21] Jake K. Aggarwal,et al. View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[22] Joseph J. LaViola,et al. Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[23] Ying Wu,et al. Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[24] Helena M. Mentis,et al. Instructing people for training gestural interactive systems , 2012, CHI.

[25] Ruzena Bajcsy,et al. Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[26] Sriram Subramanian,et al. Talking about tactile experiences , 2013, CHI.

[27] Marwan Torki,et al. Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for Action Recognition , 2013, IJCAI.

[28] Brian C. Lovell,et al. Spatio-temporal covariance descriptors for action and gesture recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[29] Hai Yang,et al. ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .