Human Action Recognition by Mining Discriminative Segment with Novel Skeleton Joint Feature

In this paper, we present a "key segment" mining approach for human action recognition. Our model is able to locate discriminative segments for action samples via multiple instance learning. Moreover, we propose a dynamic pooling approach to automatically find the optimal length of segment for each action sample. In addition, an effective feature is proposed for action recognition with 3D skeleton joints. It can effectively capture informative motion and shape cues of skeletons, and leads to a compact and discriminative representation. The experimental results validate the effectiveness of the proposed human action recognition method on two benchmark datasets (i.e., MSR Action3D and UCF-Kinect). Moreover, our method demonstrates superior accuracy than previous methods of using only skeleton data on MSR Action3D, and achieves the state-of-the-art performance on UCF-Kinect.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[3]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[5]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[6]  Carsten Rother,et al.  Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Rui Zhang,et al.  Image Classification by Hierarchical Spatial Pooling with Partial Least Squares Analysis , 2012, BMVC.

[8]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[9]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[10]  Ling Shao,et al.  Motion Histogram Analysis Based Key Frame Extraction for Human Action/Activity Representation , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[11]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[14]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Eshed Ohn-Bar,et al.  Joint Angles Similiarities and HOG 2 for Action Recognition , 2013 .

[16]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[17]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.