Learning hierarchical spatio-temporal pattern for human activity prediction

A novel approach to learning a hierarchical spatio-temporal pattern of human actions.Spatio-temporal pattern can be learned by a Hierarchical Self-Organizing Map (HSOM).The associative weights between HSOM can be obtained through Hebbian learning.Ongoing activities can be predicted by Variable order Markov Model (VMM). Human activity prediction has become increasingly valuable in many applications. This paper, initially from the perspective of cognition science, presents a novel approach to learning a hierarchical spatio-temporal pattern of human activities to predict ongoing activities from videos that contain only the onsets of the activities. Spatio-temporal pattern can be learned by a Hierarchical Self-Organizing Map (HSOM), which consists of two self-organizing maps (i.e., action map and actionlet map) connected via associative links trained by Hebbian learning. Ongoing activities can be predicted by Variable order Markov Model (VMM), which provides the means for capturing both large and small order Markov dependencies based on the training actionlet sequences. Experiments of the proposed method on four challenging 3D action datasets captured by commodity depth cameras show promising results.

[1]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[2]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[3]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Ran El-Yaniv,et al.  On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..

[5]  Andrew J. Bulpitt,et al.  Learning spatio-temporal patterns for predicting object behaviour , 2000, Image Vis. Comput..

[6]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[7]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[8]  Silvio Savarese,et al.  A Hierarchical Representation for Future Action Prediction , 2014, ECCV.

[9]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[10]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[12]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[13]  Chong Wang,et al.  Superpixel-Based Hand Gesture Recognition With Kinect Depth Camera , 2015, IEEE Transactions on Multimedia.

[14]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jin Zhang,et al.  STFC: Spatio-temporal feature chain for skeleton-based human action recognition , 2015, J. Vis. Commun. Image Represent..

[16]  Gang Yu,et al.  Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction , 2014, ACCV.

[17]  Tieniu Tan,et al.  Learning activity patterns using fuzzy self-organizing neural network , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Yun Fu,et al.  Prediction of Human Activity by Discovering Temporal Sequence Patterns , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Fernando De la Torre,et al.  Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[21]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[22]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[23]  Leonid Sigal,et al.  Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Karl J. Friston Learning and inference in the brain , 2003, Neural Networks.

[25]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Yi Yang,et al.  Semi-Supervised Multiple Feature Analysis for Action Recognition , 2014, IEEE Transactions on Multimedia.

[27]  Sergio A. Velastin,et al.  Recognizing Human Actions Using Silhouette-based HMM , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[28]  Hong Liu,et al.  Inferring Ongoing Human Activities Based on Recurrent Self-Organizing Map Trajectory , 2013, BMVC.

[29]  Sven J. Dickinson,et al.  Recognize Human Activities from Partially Observed Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[32]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[33]  M. Brass,et al.  Unconscious determinants of free decisions in the human brain , 2008, Nature Neuroscience.

[34]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[35]  Golan Yona,et al.  Variations on probabilistic suffix trees: statistical modeling and prediction of protein families , 2001, Bioinform..

[36]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.