Sparseness embedding in bending of space and time; a case study on unsupervised 3D action recognition

Abstract Human action recognition from skeletal data is one of the most popular topics in computer vision which has been widely studied in the literature, occasionally with some very promising results. However, being supervised, most of the existing methods suffer from two major drawbacks; (1) too much reliance on massive labeled data and (2) high sensitivity to outliers, which in turn hinder their applications in such real-world scenarios as recognizing long-term and complex movements. In this paper, we propose a novel unsupervised 3D action recognition method called Sparseness Embedding in which the spatiotemporal representation of action sequences is nonlinearly projected into an unwarped feature representation medium, where unlike the original curved space, one can easily apply the Euclidean metrics. Our strategy can simultaneously integrate the characteristics of nonlinearity, sparsity, and space curvature of sequences into a single objective function, leading to a more robust and highly compact representation of discriminative attributes without any need to label information. Moreover, we propose a joint learning strategy for dealing with the heterogeneity of the temporal and spatial characteristics of action sequences. A set of extensive experiments on six publicly available databases, including UTKinect, TST fall, UTD-MHAD, CMU, Berkeley MHAD, and NTU RGB+D demonstrates the superiority of our method compared with the state-of-the-art algorithms.

[1]  Jianhua Dai,et al.  Unsupervised Representation Learning With Long-Term Dynamics for Skeleton Based Action Recognition , 2018, AAAI.

[2]  Liang Yan,et al.  Group Sparse Regression-Based Learning Model for Real-Time Depth-Based Human Action Prediction , 2018 .

[3]  Jian Yang,et al.  Action-Attending Graphic Neural Network , 2017, IEEE Transactions on Image Processing.

[4]  Xu Zhao,et al.  Skeleton Feature Fusion Based on Multi-Stream LSTM for Action Recognition , 2018, IEEE Access.

[5]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Osama Mazhar,et al.  Towards Real-Time Physical Human-Robot Interaction Using Skeleton Information and Hand Gestures , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Pichao Wang,et al.  Multiview-Based 3-D Action Recognition Using Deep Networks , 2019, IEEE Transactions on Human-Machine Systems.

[8]  Yifeng He,et al.  Integrating Entropy Skeleton Motion Maps and Convolutional Neural Networks for Human Action Recognition , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Yin Wang,et al.  Efficient Temporal Sequence Comparison and Classification Using Gram Matrix Embeddings on a Riemannian Manifold , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Basilio Bona,et al.  Tracking a Subset of Skeleton Joints: An Effective Approach towards Complex Human Activity Recognition , 2017, J. Robotics.

[12]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[14]  C. Scogings,et al.  An Investigation of Skeleton-Based Optical Flow-Guided Features for 3D Action Recognition Using a Multi-Stream CNN Model , 2018, 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC).

[15]  Juan Song,et al.  Human action recognition using multi-layer codebooks of key poses and atomic motions , 2016, Signal Process. Image Commun..

[16]  Hans-Werner Gellersen,et al.  AutoBAP: Automatic Coding of Body Action and Posture Units from Wearable Sensors , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[17]  Hoda Mohammadzade,et al.  Simultaneous Joint and Object Trajectory Templates for Human Activity Recognition from 3-D Data , 2017, J. Vis. Commun. Image Represent..

[18]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[19]  Justin K. Romberg,et al.  Sparse Recovery of Streaming Signals Using $\ell_1$-Homotopy , 2013, IEEE Transactions on Signal Processing.

[20]  Liang Wang,et al.  Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection , 2018, IEEE Transactions on Image Processing.

[21]  Pichao Wang,et al.  Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[22]  Mohamed S. Kamel,et al.  Efficient greedy feature selection for unsupervised learning , 2012, Knowledge and Information Systems.

[23]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Louahdi Khoudour,et al.  Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[25]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[26]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Hoda Mohammadzade,et al.  Fisherposes for Human Action Recognition Using Kinect Sensor Data , 2017, IEEE Sensors Journal.

[28]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[29]  ByoungSeon Choi,et al.  Arma Model Identification , 1992 .

[30]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[31]  Hong Liu,et al.  Spatial-Temporal Data Augmentation Based on LSTM Autoencoder Network for Skeleton-Based Human Action Recognition , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[32]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jian Liu,et al.  Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition , 2017, CVPR Workshops.

[35]  Wei Liu,et al.  Discriminative Multi-instance Multitask Learning for 3D Action Recognition , 2017, IEEE Transactions on Multimedia.

[36]  Alan L. Yuille,et al.  Mining 3D Key-Pose-Motifs for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Juergen Gall,et al.  A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Wei Liu,et al.  Latent Max-Margin Multitask Learning With Skelets for 3-D Action Recognition , 2017, IEEE Transactions on Cybernetics.

[39]  Max Q.-H. Meng,et al.  Skeleton-Based Human Action Recognition by Pose Specificity and Weighted Voting , 2018, Int. J. Soc. Robotics.

[40]  Gang Wang,et al.  Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[43]  Nasser Kehtarnavaz,et al.  Action Detection and Recognition in Continuous Action Streams by Deep Learning-Based Sensing Fusion , 2018, IEEE Sensors Journal.

[44]  Ennio Gambi,et al.  A Human Activity Recognition System Using Skeleton Data from RGBD Sensors , 2016, Comput. Intell. Neurosci..

[45]  Xiaoming Liu,et al.  On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[46]  Luc Van Gool,et al.  Does Human Action Recognition Benefit from Pose Estimation? , 2011, BMVC.

[47]  Bradley Hayes,et al.  Interpretable models for fast activity recognition and anomaly explanation during collaborative robotics tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Maher Moakher,et al.  Symmetric Positive-Definite Matrices: From Geometry to Applications and Visualization , 2006, Visualization and Processing of Tensor Fields.

[50]  Yansong Tang,et al.  Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[52]  Hassen Drira,et al.  Coding Kendall's Shape Trajectories for 3D Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Anders Grunnet-Jepsen,et al.  Intel RealSense Stereoscopic Depth Cameras , 2017, CVPR 2017.

[54]  Srinivas Akella,et al.  3D human action segmentation and recognition using pose kinetic energy , 2014, 2014 IEEE International Workshop on Advanced Robotics and its Social Impacts.

[55]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Albert Ali Salah,et al.  Second International Workshop on Human Behavior Understanding: Inducing Behavioral Change , 2011, AmI.

[57]  Fakhreddine Ababsa,et al.  3D Human Tracking in a Top View Using Depth Information Recorded by the Xtion Pro-Live Camera , 2013, ISVC.

[58]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[60]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[61]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[62]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[63]  Pinar Duygulu Sahin,et al.  Recognizing Human Actions Using Key Poses , 2010, 2010 20th International Conference on Pattern Recognition.

[64]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[65]  Ezzeddine Zagrouba,et al.  Abnormal behavior recognition for intelligent video surveillance systems: A review , 2018, Expert Syst. Appl..

[66]  Rushil Anirudh,et al.  Elastic functional coding of human actions: From vector-fields to latent variables , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[68]  Yi Zhang,et al.  Improved Key Poses Model for Skeleton-Based Action Recognition , 2017, PCM.

[69]  Qing Lei,et al.  A Comprehensive Survey of Vision-Based Human Action Recognition Methods , 2019, Sensors.

[70]  Jian-Huang Lai,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Albert A. Rizzo,et al.  FAAST: The Flexible Action and Articulated Skeleton Toolkit , 2011, 2011 IEEE Virtual Reality Conference.

[74]  Ali Aghagolzadeh,et al.  Human action recognition based on the Grassmann multi-graph embedding , 2018, Signal Image Video Process..

[75]  Bo Li,et al.  Intelligent video surveillance for real-time detection of suicide attempts , 2018, Pattern Recognit. Lett..

[76]  Kate Saenko,et al.  Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text , 2016, EMNLP.

[77]  Caifeng Shan,et al.  Sensors, vision and networks: From video surveillance to activity recognition and health monitoring , 2019, J. Ambient Intell. Smart Environ..

[78]  Xu Zhao,et al.  Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition , 2019, IEEE Transactions on Multimedia.

[79]  Stefano Berretti,et al.  A Novel Geometric Framework on Gram Matrix Trajectories for Human Behavior Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Nassir Navab,et al.  Human skeleton tracking from depth data using geodesic distances and optical flow , 2012, Image Vis. Comput..

[81]  Jiwen Lu,et al.  Part-Activated Deep Reinforcement Learning for Action Prediction , 2018, ECCV.

[82]  A. Einstein Zur Elektrodynamik bewegter Körper , 1905 .

[83]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[84]  Balaraman Ravindran,et al.  Activity Recognition for Natural Human Robot Interaction , 2014, ICSR.

[85]  Lynne E. Parker,et al.  Skeleton-based bio-inspired human activity prediction for real-time human–robot interaction , 2017, Autonomous Robots.

[86]  Anoop Cherian,et al.  Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Yi Zhang,et al.  Mining Key Skeleton Poses with Latent SVM for Action Recognition , 2017, Appl. Comput. Intell. Soft Comput..

[88]  Anuj Srivastava,et al.  Accurate 3D action recognition using learning on the Grassmann manifold , 2015, Pattern Recognit..

[89]  Cemal Köse,et al.  Improving bag-of-poses with semi-temporal pose descriptors for skeleton-based action recognition , 2018, The Visual Computer.

[90]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[91]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[92]  Kenneth D. Forbus,et al.  Action Recognition From Skeleton Data via Analogical Generalization Over Qualitative Representations , 2018, AAAI.

[93]  K. Heilman Brain Circuits and Functions of the Mind Essays in Honor of Roger W. Sperry , 1990, Neurology.

[94]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95]  Christian Bauckhage,et al.  Efficient Pose-Based Action Recognition , 2014, ACCV.

[96]  Ioannis A. Kakadiaris,et al.  Modeling Motion of Body Parts for Action Recognition , 2011, BMVC.

[97]  H. S. Wolff,et al.  iRun: Horizontal and Vertical Shape of a Region-Based Graph Compression , 2022, Sensors.

[98]  Naixue Xiong,et al.  Human Action Monitoring for Healthcare Based on Deep Learning , 2018, IEEE Access.

[99]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[100]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[101]  Yanlei Gu,et al.  Customer behavior classification using surveillance camera for marketing , 2017, Multimedia Tools and Applications.

[102]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks , 2018, Knowl. Based Syst..

[103]  Meinard Müller,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH '05.

[104]  Fillipe Dias Moreira de Souza,et al.  An Evaluation on Color Invariant Based Local Spatiotemporal Features for Action Recognition , 2012 .

[105]  René Vidal,et al.  Moving Poselets: A Discriminative and Interpretable Skeletal Motion Representation for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[106]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[107]  Sheng-Luen Chung,et al.  Robust Human Action Recognition Using Global Spatial-Temporal Attention for Human Skeleton Data , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[108]  Rama Chellappa,et al.  Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds , 2011, Comput. Vis. Image Underst..

[109]  Heng Tao Shen,et al.  Video Captioning With Attention-Based LSTM and Semantic Consistency , 2017, IEEE Transactions on Multimedia.

[110]  Gang Wang,et al.  Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[111]  Sergio Escalera,et al.  RGB-D-based Human Motion Recognition with Deep Learning: A Survey , 2017, Comput. Vis. Image Underst..

[112]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[113]  Xiaoyang Tan,et al.  Pattern Recognition , 2016, Communications in Computer and Information Science.

[114]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[115]  Bing-Yu Sun,et al.  A Study on the Dynamic Time Warping in Kernel Machines , 2007, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.

[116]  Luc Van Gool,et al.  Deep Learning on Lie Groups for Skeleton-Based Action Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).