Early Recognition of 3D Human Actions

Action recognition is an important research problem of human motion analysis (HMA). In recent years, 3D observation-based action recognition has been receiving increasing interest in the multimedia and computer vision communities, due to the recent advent of cost-effective sensors, such as depth camera Kinect. This work takes this one step further, focusing on early recognition of ongoing 3D human actions, which is beneficial for a large variety of time-critical applications, e.g., gesture-based human machine interaction, somatosensory games, and so forth. Our goal is to infer the class label information of 3D human actions with partial observation of temporally incomplete action executions. By considering 3D action data as multivariate time series (m.t.s.) synchronized to a shared common clock (frames), we propose a stochastic process called dynamic marked point process (DMP) to model the 3D action as temporal dynamic patterns, where both timing and strength information are captured. To achieve even more early and better accuracy of recognition, we also explore the temporal dependency patterns between feature dimensions. A probabilistic suffix tree is constructed to represent sequential patterns among features in terms of the variable-order Markov model (VMM). Our approach and several baselines are evaluated on five 3D human action datasets. Extensive results show that our approach achieves superior performance for early recognition of 3D human actions.

[1]  Dacheng Tao,et al.  Segment-Based Features for Time Series Classification , 2012, Comput. J..

[2]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[3]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[4]  Hairong Qi,et al.  Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Mohamed F. Ghalwash,et al.  Early classification of multivariate time series using a hybrid HMM/SVM model , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[6]  Silvio Savarese,et al.  A Hierarchical Representation for Future Action Prediction , 2014, ECCV.

[7]  James M. Rehg,et al.  Temporal causality for the analysis of visual events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[9]  Eugene Tuv,et al.  Constructing High Dimensional Feature Space for Time Series Classification , 2007, PKDD.

[10]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ulrike Goldschmidt,et al.  An Introduction To The Theory Of Point Processes , 2016 .

[12]  Yun Fu,et al.  Early Classification of Ongoing Observation , 2014, 2014 IEEE International Conference on Data Mining.

[13]  Yun Fu,et al.  Temporal Subspace Clustering for Human Motion Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Alberto Del Bimbo,et al.  Submitted to Ieee Transactions on Cybernetics 1 3d Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold , 2022 .

[15]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Houqiang Li,et al.  Attribute Mining for Scalable 3D Human Action Recognition , 2015, ACM Multimedia.

[17]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[18]  Wenjun Zeng,et al.  Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks , 2016, ECCV.

[19]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[20]  Antoine Cornuéjols,et al.  Early Classification of Time Series as a Non Myopic Sequential Decision Making Problem , 2015, ECML/PKDD.

[21]  Ran El-Yaniv,et al.  On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..

[22]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[24]  Yun Fu,et al.  Videography-Based Unconstrained Video Analysis , 2017, IEEE Transactions on Image Processing.

[25]  Yingli Tian,et al.  Histogram of 3D Facets: A depth descriptor for human action and hand gesture recognition , 2015, Comput. Vis. Image Underst..

[26]  Yun Fu,et al.  Multi-View Time Series Classification: A Discriminative Bilinear Projection Approach , 2016, CIKM.

[27]  Yun Fu,et al.  A Discriminative Model with Multiple Temporal Scales for Action Prediction , 2014, ECCV.

[28]  Tomohiro Hayashida,et al.  Multiobjective Evolutionary Optimization of Training and Topology of Recurrent Neural Networks for Time-Series Prediction , 2010, 2010 International Conference on Information Science and Applications.

[29]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[31]  Behrooz Mahasseni,et al.  Regularizing Long Short Term Memory with 3D Human-Skeleton Sequences for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Fei-Fei Li,et al.  Web image prediction using multivariate point processes , 2012, KDD.

[33]  Yun Fu,et al.  Hierarchical 3D kernel descriptors for action recognition using depth sequences , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[34]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[35]  Meng Wang,et al.  A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition , 2015, International Journal of Computer Vision.

[36]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[37]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[38]  Richard Bowden,et al.  Hollywood 3D: Recognizing Actions in 3D Natural Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Juan José Rodríguez Diez,et al.  Early Fault Classification in Dynamic Systems Using Case-Based Reasoning , 2005, CAEPIA.

[40]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[42]  Jian Pei,et al.  Reliable Early Classification on Multivariate Time Series with Numerical and Categorical Attributes , 2015, PAKDD.

[43]  Yong Duan,et al.  Early classification on multivariate time series , 2015, Neurocomputing.

[44]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Philip S. Yu,et al.  Extracting Interpretable Features for Early Classification on Time Series , 2011, SDM.

[46]  Hema Swetha Koppula,et al.  Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation , 2013, ICML.

[47]  Aren Jansen,et al.  Point process models for event-based speech recognition , 2009, Speech Commun..

[48]  Mohamed F. Ghalwash,et al.  Early classification of multivariate temporal observations by extraction of interpretable shapelets , 2012, BMC Bioinformatics.

[49]  Fernando De la Torre,et al.  Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Xu Chen,et al.  Early prediction on imbalanced multivariate time series , 2013, CIKM.

[51]  Ákos Utasi,et al.  A 3-D marked point process model for multi-view people detection , 2011, CVPR 2011.

[52]  Philip S. Yu,et al.  Early prediction on time series: a nearest neighbor approach , 2009, IJCAI 2009.

[53]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[54]  Yun Fu,et al.  Modeling Complex Temporal Composition of Actionlets for Activity Prediction , 2012, ECCV.

[55]  Li Wei,et al.  Semi-supervised time series classification , 2006, KDD '06.

[56]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[58]  Yun Fu,et al.  Prediction of Human Activity by Discovering Temporal Sequence Patterns , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[60]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[61]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[62]  Henrik Boström,et al.  Boosting interval based literals , 2001, Intell. Data Anal..

[63]  Robert T. Collins,et al.  Marked point processes for crowd counting , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Luís Nunes,et al.  Human Activity Recognition and Prediction , 2015 .

[65]  Sven J. Dickinson,et al.  Recognize Human Activities from Partially Observed Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Puyang Xu,et al.  A Model for Temporal Dependencies in Event Streams , 2011, NIPS.

[67]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.