Effective Active Skeleton Representation for Low Latency Human Action Recognition

With the development of depth sensors, low latency 3D human action recognition has become increasingly important in various interaction systems, where response with minimal latency is a critical process. High latency not only significantly degrades the interaction experience of users, but also makes certain interaction systems, e.g., gesture control or electronic gaming, unattractive. In this paper, we propose a novel active skeleton representation towards low latency human action recognition . First, we encode each limb of the human skeleton into a state through a Markov random field. The active skeleton is then represented by aggregating the encoded features of individual limbs. Finally, we propose a multi-channel multiple instance learning with maximum-pattern-margin to further boost the performance of the existing model. Our method is robust in calculating features related to joint positions, and effective in handling the unsegmented sequences. Experiments on the MSR Action3D, the MSR DailyActivity3D, and the Huawei/3DLife-2013 dataset demonstrate the effectiveness of the model with the proposed novel representation, and its superiority over the state-of-the-art low latency recognition approaches.

[1]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[2]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[3]  Lin Xiao,et al.  Energy cooperation in multi-user wireless-powered relay networks , 2015, IET Commun..

[4]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Shaogang Gong,et al.  Action categorization with modified hidden conditional random field , 2010, Pattern Recognit..

[6]  Nuno Vasconcelos,et al.  Recognizing Activities via Bag of Words for Attribute Dynamics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jernej Barbic,et al.  Segmenting Motion Capture Data into Distinct Behaviors , 2004, Graphics Interface.

[8]  Quan Z. Sheng,et al.  Online human gesture recognition from motion data streams , 2013, ACM Multimedia.

[9]  Shuicheng Yan,et al.  Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Ling Shao,et al.  Embedding Motion and Structure Features for Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Qi Tian,et al.  Towards Codebook-Free: Scalable Cascaded Hashing for Mobile Image Search , 2014, IEEE Transactions on Multimedia.

[12]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[13]  Hong-Yuan Mark Liao,et al.  Example-Based Human Motion Extrapolation and Motion Repairing Using Contour Manifold , 2014, IEEE Transactions on Multimedia.

[14]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[15]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[16]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[17]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  S. Sclaroff,et al.  Web-Based Classifiers for Human Action Recognition , 2012, IEEE Transactions on Multimedia.

[20]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[22]  Yi Yang,et al.  Semi-Supervised Multiple Feature Analysis for Action Recognition , 2014, IEEE Transactions on Multimedia.

[23]  Xuelong Li,et al.  Efficient HOG human detection , 2011, Signal Process..

[24]  Fei Wang,et al.  Maximum Margin Multiple Instance Clustering With Applications to Image and Text Clustering , 2011, IEEE Transactions on Neural Networks.

[25]  Yanwei Pang,et al.  Learning Regularized LDA by Clustering , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Qi Tian,et al.  BSIFT: Toward Data-Independent Codebook for Large Scale Image Search , 2015, IEEE Transactions on Image Processing.

[27]  Bingbing Ni,et al.  Order-Preserving Sparse Coding for Sequence Classification , 2012, ECCV.

[28]  Kongqiao Wang,et al.  Distributed Object Detection With Linear SVMs , 2014, IEEE Transactions on Cybernetics.

[29]  Dingcheng Yang,et al.  Wireless Information and Power Transfer: Optimal Power Control in One-Way and Two-Way Relay System , 2015, Wireless Personal Communications.

[30]  Yale Song,et al.  Action Recognition by Hierarchical Sequence Summarization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Zhuowen Tu,et al.  Action Recognition with Actons , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Houqiang Li,et al.  An effective representation for action recognition with human skeleton joints , 2014, Photonics Asia.

[34]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[35]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Kongqiao Wang,et al.  Robust CoHOG Feature Extraction in Human-Centered Image/Video Management System , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Yun Fu,et al.  Exploring discriminative pose sub-patterns for effective action classification , 2013, ACM Multimedia.

[38]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[39]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Darko Kirovski,et al.  Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[41]  Qi Tian,et al.  Scalable Object Retrieval with Compact Image Representation from Generic Object Regions , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[42]  Bhabatosh Chanda,et al.  Space-Time Facet Model for Human Activity Classification , 2014, IEEE Transactions on Multimedia.

[43]  Qi Tian,et al.  Scalable Feature Matching by Dual Cascaded Scalar Quantization for Image Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[45]  Gérard G. Medioni,et al.  Kernelized Temporal Cut for Online Temporal Segmentation and Recognition , 2012, ECCV.

[46]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[47]  G. Johansson Visual motion perception. , 1975, Scientific American.

[48]  Petros Daras,et al.  Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data , 2014, MMM.

[49]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Hairong Qi,et al.  Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[51]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[53]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[54]  Qi Tian,et al.  Making Residual Vector Distribution Uniform for Distinctive Image Representation , 2016, IEEE Transactions on Circuits and Systems for Video Technology.