Leveraging the Path Signature for Skeleton-based Human Action Recognition

Human action recognition in videos is one of the most challenging tasks in computer vision. One important issue is how to design discriminative features for representing spatial context and temporal dynamics. Here, we introduce a path signature feature to encode information from intra-frame and inter-frame contexts. A key step towards leveraging this feature is to construct the proper trajectories (paths) for the data steam. In each frame, the correlated constraints of human joints are treated as small paths, then the spatial path signature features are extracted from them. In video data, the evolution of these spatial features over time can also be regarded as paths from which the temporal path signature features are extracted. Eventually, all these features are concatenated to constitute the input vector of a fully connected neural network for action classification. Experimental results on four standard benchmark action datasets, J-HMDB, SBU Dataset, Berkeley MHAD, and NTURGB+D demonstrate that the proposed approach achieves state-of-the-art accuracy even in comparison with recent deep learning based models.

[1]  Robert Bergevin,et al.  Semantic human activity recognition: A literature review , 2015, Pattern Recognit..

[2]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[3]  Terry Lyons,et al.  Sound compression: a rough path approach , 2005 .

[4]  Chalavadi Krishna Mohan,et al.  Human Action Recognition Based on MOCAP Information Using Convolution Neural Networks , 2014, 2014 13th International Conference on Machine Learning and Applications.

[5]  Kuo-Tsai Chen INTEGRATION OF PATHS—A FAITHFUL REPRE- SENTATION OF PATHS BY NONCOMMUTATIVE FORMAL POWER SERIES , 1958 .

[6]  Terry Lyons,et al.  Extracting information from the signature of a financial data stream , 2013, 1307.7244.

[7]  H. Boedihardjo,et al.  Uniqueness of signature for simple curves , 2013 .

[8]  Guodong Guo,et al.  A survey on still image based human action recognition , 2014, Pattern Recognit..

[9]  Lianwen Jin,et al.  Rotation-free online handwritten character recognition using dyadic path signature features, hanging normalization, and deep neural network , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[10]  Pichao Wang,et al.  Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[11]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Fei Yin,et al.  Chinese Handwriting Recognition Competition , 2013 .

[14]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[15]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[16]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[17]  Lianwen Jin,et al.  DeepWriterID: An End-to-End Online Text-Independent Writer Identification System , 2015, IEEE Intelligent Systems.

[18]  Meng Li,et al.  Multiview Skeletal Interaction Recognition Using Active Joint Interaction Graph , 2016, IEEE Transactions on Multimedia.

[19]  Terry Lyons,et al.  Discretely sampled signals and the rough Hoff process , 2013, 1310.4054.

[20]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Lianwen Jin,et al.  Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[22]  Andrey Kormilitzin,et al.  A Primer on the Signature Method in Machine Learning , 2016, ArXiv.

[23]  Wei-Shi Zheng,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[25]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[26]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[28]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[29]  Ling Shao,et al.  Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Hong Cheng,et al.  Interactive body part contrast mining for human interaction recognition , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[31]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  V. M. Zat︠s︡iorskiĭ Kinematics of human motion , 1998 .

[33]  Mohammed Bennamoun,et al.  SkeletonNet: Mining Deep Part Features for 3-D Action Recognition , 2017, IEEE Signal Processing Letters.

[34]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[36]  R. Venkatesh Babu,et al.  Real-time human action recognition from motion capture data , 2013, 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG).

[37]  École d'été de probabilités de Saint-Flour,et al.  Differential equations driven by rough paths , 2007 .

[38]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[39]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[40]  Terry Lyons Rough paths, Signatures and the modelling of functions on streams , 2014, 1405.4537.

[41]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[42]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[45]  Mooi Choo Chuah,et al.  Category-Blind Human Action Recognition: A Practical Recognition System , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[47]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[48]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[50]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[51]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[52]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Terry Lyons,et al.  Uniqueness for the signature of a path of bounded variation and the reduced path group , 2005, math/0507536.

[54]  Lianwen Jin,et al.  Fully convolutional recurrent network for handwritten Chinese text recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[55]  Koichi Shinoda,et al.  Spectral Graph Skeletons for 3D Action Recognition , 2014, ACCV.

[56]  Eshed Ohn-Bar,et al.  Joint Angles Similiarities and HOG 2 for Action Recognition , 2013 .

[57]  Meinard Müller,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH '05.

[58]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[59]  Nikos Nikolaidis,et al.  Action recognition on motion capture data using a dynemes and forward differences representation , 2014, J. Vis. Commun. Image Represent..

[60]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[61]  Ekta Vats,et al.  Fuzzy human motion analysis: A review , 2014, Pattern Recognit..

[62]  Lianwen Jin,et al.  DropSample: A New Training Method to Enhance Deep Convolutional Neural Networks for Large-Scale Unconstrained Handwritten Chinese Character Recognition , 2015, Pattern Recognit..

[63]  Hao Ni A multi-dimensional stream and its signature representation , 2015 .

[64]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[65]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[66]  Terry Lyons,et al.  The Signature of a Rough Path: Uniqueness , 2014, 1406.7871.

[67]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[68]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Franz J. Király,et al.  Kernels for sequentially ordered data , 2016, J. Mach. Learn. Res..

[70]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[71]  James M. Rehg,et al.  Movement Pattern Histogram for Action Recognition and Retrieval , 2014, ECCV.

[72]  Zheng-Jun Zha,et al.  Action recognition with novel high-level pose features , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[73]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[74]  Nanning Zheng,et al.  Concurrent Action Detection with Structural Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[75]  Venkatesh Babu Radhakrishnan,et al.  Action recognition from motion capture data using Meta-Cognitive RBF Network classifier , 2014, 2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[76]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[77]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[78]  Lianwen Jin,et al.  Chinese character-level writer identification using path signature feature, DropStroke and deep CNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[79]  Benjamin Graham,et al.  Sparse arrays of signatures for online character recognition , 2013, ArXiv.

[80]  Joscha Diehl Rotation invariants of two dimensional curves based on iterated integrals , 2013, ArXiv.