Developing the Path Signature Methodology and its Application to Landmark-based Human Action Recognition

Landmark-based human action recognition in videos is a challenging task in computer vision. One key step is to design a generic approach that generates discriminative features for the spatial structure and temporal dynamics. To this end, we regard the evolving landmark data as a high-dimensional path and apply non-linear path signature techniques to provide an expressive, robust, non-linear, and interpretable representation for the sequential events. We do not extract signature features from the raw path, rather we propose path disintegrations and path transformations as preprocessing steps. Path disintegrations turn a high-dimensional path linearly into a collection of lower-dimensional paths; some of these paths are in pose space while others are defined over a multiscale collection of temporal intervals. Path transformations decorate the paths with additional coordinates in standard ways to allow the truncated signatures of transformed paths to expose additional features. For spatial representation, we apply the signature transform to vectorize the paths that arise out of pose disintegration, and for temporal representation, we apply it again to describe this evolving vectorization. Finally, all the features are collected together to constitute the input vector of a linear single-hidden-layer fully-connected network for classification. Experimental results on four datasets demonstrated that the proposed feature set with only a linear shallow network and Dropconnect is effective and achieves comparable state-of-the-art results to the advanced deep networks, and meanwhile, is capable of interpretation.

[1]  Bernd Sturmfels,et al.  Learning Paths from Signature Tensors , 2018, SIAM J. Matrix Anal. Appl..

[2]  B. Sturmfels,et al.  VARIETIES OF SIGNATURE TENSORS , 2018, Forum of Mathematics, Sigma.

[3]  Nanning Zheng,et al.  View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Franz J. Király,et al.  Kernels for sequentially ordered data , 2016, J. Mach. Learn. Res..

[5]  Imanol Perez Arribas Derivatives pricing using signature payoffs , 2018, 1809.09466.

[6]  Imed Riadh Farah,et al.  Action Recognition from 3D Skeleton Sequences using Deep Networks on Lie Group Features , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[7]  Chao Li,et al.  Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation , 2018, IJCAI.

[8]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Yueting Zhuang,et al.  Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2018, IEEE Transactions on Multimedia.

[10]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[11]  Terry Lyons,et al.  A signature-based machine learning model for distinguishing bipolar disorder and borderline personality disorder , 2017, Translational Psychiatry.

[12]  Gang Wang,et al.  Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Lianwen Jin,et al.  Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Gang Wang,et al.  Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Mohammed Bennamoun,et al.  SkeletonNet: Mining Deep Part Features for 3-D Action Recognition , 2017, IEEE Signal Processing Letters.

[16]  Pichao Wang,et al.  Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[17]  Xiaoming Liu,et al.  On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[20]  Juergen Gall,et al.  Pose for Action - Action for Pose , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[21]  Wei-Shi Zheng,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Lianwen Jin,et al.  Rotation-free online handwritten character recognition using dyadic path signature features, hanging normalization, and deep neural network , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[23]  Meng Li,et al.  Multiview Skeletal Interaction Recognition Using Active Joint Interaction Graph , 2016, IEEE Transactions on Multimedia.

[24]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[25]  Zheng-Jun Zha,et al.  Action recognition with novel high-level pose features , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[26]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[27]  Lianwen Jin,et al.  Fully convolutional recurrent network for handwritten Chinese text recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[28]  Bharti Bansal,et al.  Gesture Recognition: A Survey , 2016 .

[29]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Andrey Kormilitzin,et al.  A Primer on the Signature Method in Machine Learning , 2016, ArXiv.

[31]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[32]  Lianwen Jin,et al.  DeepWriterID: An End-to-End Online Text-Independent Writer Identification System , 2015, IEEE Intelligent Systems.

[33]  Tian-Tsong Ng,et al.  Multimodal Multipart Learning for Action Recognition in Depth Videos , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Lianwen Jin,et al.  DropSample: A New Training Method to Enhance Deep Convolutional Neural Networks for Large-Scale Unconstrained Handwritten Chinese Character Recognition , 2015, Pattern Recognit..

[35]  Terry Lyons,et al.  Discretely sampled signals and the rough Hoff process , 2013, 1310.4054.

[36]  Mooi Choo Chuah,et al.  Category-Blind Human Action Recognition: A Practical Recognition System , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Hao Ni A multi-dimensional stream and its signature representation , 2015 .

[38]  Lianwen Jin,et al.  Chinese character-level writer identification using path signature feature, DropStroke and deep CNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[39]  Robert Bergevin,et al.  Semantic human activity recognition: A literature review , 2015, Pattern Recognit..

[40]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Song-Chun Zhu,et al.  Joint action recognition and pose estimation from video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Lianwen Jin,et al.  Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[44]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Ekta Vats,et al.  Fuzzy human motion analysis: A review , 2014, Pattern Recognit..

[46]  Chalavadi Krishna Mohan,et al.  Human Action Recognition Based on MOCAP Information Using Convolution Neural Networks , 2014, 2014 13th International Conference on Machine Learning and Applications.

[47]  Koichi Shinoda,et al.  Spectral Graph Skeletons for 3D Action Recognition , 2014, ACCV.

[48]  Guodong Guo,et al.  A survey on still image based human action recognition , 2014, Pattern Recognit..

[49]  James M. Rehg,et al.  Movement Pattern Histogram for Action Recognition and Retrieval , 2014, ECCV.

[50]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[51]  Nikos Nikolaidis,et al.  Action recognition on motion capture data using a dynemes and forward differences representation , 2014, J. Vis. Commun. Image Represent..

[52]  Hong Cheng,et al.  Interactive body part contrast mining for human interaction recognition , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[53]  Terry Lyons,et al.  The Signature of a Rough Path: Uniqueness , 2014, 1406.7871.

[54]  Ling Shao,et al.  Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Terry Lyons Rough paths, Signatures and the modelling of functions on streams , 2014, 1405.4537.

[57]  Venkatesh Babu Radhakrishnan,et al.  Action recognition from motion capture data using Meta-Cognitive RBF Network classifier , 2014, 2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[58]  Terry Lyons,et al.  Extracting information from the signature of a financial data stream , 2013, 1307.7244.

[59]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[60]  R. Venkatesh Babu,et al.  Real-time human action recognition from motion capture data , 2013, 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG).

[61]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[62]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[63]  Nanning Zheng,et al.  Concurrent Action Detection with Structural Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[64]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[65]  Fei Yin,et al.  ICDAR 2013 Chinese Handwriting Recognition Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[66]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[67]  Benjamin Graham,et al.  Sparse arrays of signatures for online character recognition , 2013, ArXiv.

[68]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[69]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[71]  Joscha Diehl Rotation invariants of two dimensional curves based on iterated integrals , 2013, ArXiv.

[72]  H. Boedihardjo,et al.  Uniqueness of signature for simple curves , 2013 .

[73]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[74]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[75]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[76]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[77]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[79]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[81]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[82]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[83]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[84]  Terry Lyons,et al.  Uniqueness for the signature of a path of bounded variation and the reduced path group , 2005, math/0507536.

[85]  École d'été de probabilités de Saint-Flour,et al.  Differential equations driven by rough paths , 2007 .

[86]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[87]  Meinard Müller,et al.  Efficient content-based retrieval of motion capture data , 2005, ACM Trans. Graph..

[88]  Terry Lyons,et al.  Sound compression: a rough path approach , 2005 .

[89]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[90]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[91]  Terry Lyons Di erential equations driven by rough signals , 1998 .

[92]  V. M. Zat︠s︡iorskiĭ Kinematics of human motion , 1998 .

[93]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[94]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[95]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[96]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[97]  Kuo-Tsai Chen INTEGRATION OF PATHS—A FAITHFUL REPRE- SENTATION OF PATHS BY NONCOMMUTATIVE FORMAL POWER SERIES , 1958 .