Gesture Modeling by Hanklet-Based Hidden Markov Model

In this paper we propose a novel approach for gesture modeling. We aim at decomposing a gesture into sub-trajectories that are the output of a sequence of atomic linear time invariant (LTI) systems, and we use a Hidden Markov Model to model the transitions from the LTI system to another. For this purpose, we represent the human body motion in a temporal window as a set of body joint trajectories that we assume are the output of an LTI system. We describe the set of trajectories in a temporal window by the corresponding Hankel matrix (Hanklet), which embeds the observability matrix of the LTI system that produced it. We train a set of HMMs (one for each gesture class) with a discriminative approach. To account for the sharing of body motion templates we allow the HMMs to share the same state space. We demonstrate by means of experiments on two publicly available datasets that, even with just considering the trajectories of the 3D joints, our method achieves state-of-the-art accuracy while competing well with methods that employ more complex models and feature representations.

[1]  Fabio Cuzzolin,et al.  Learning Pullback HMM Distances. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[2]  Bassam Bamieh,et al.  Identification of linear parameter varying models , 2002 .

[3]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[4]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[5]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[6]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[8]  Amit K. Roy-Chowdhury,et al.  A “string of feature graphs” model for recognition of complex activities in natural videos , 2011, 2011 International Conference on Computer Vision.

[9]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[10]  Mario Sznaier,et al.  The Way They Move: Tracking Multiple Targets with Similar Appearance , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[12]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[13]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[14]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Bohyung Han,et al.  Scenario-based video event recognition by constraint flow , 2011, CVPR 2011.

[16]  Binlong Li,et al.  Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Alberto Del Bimbo,et al.  Space-Time Pose Representation for 3D Human Action Recognition , 2013, ICIAP Workshops.

[20]  Sangho Park,et al.  Recognition of two-person interactions using a hierarchical Bayesian network , 2003, IWVS '03.

[21]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[22]  Biing-Hwang Juang,et al.  Discriminative training of dynamic programming based speech recognizers , 1993, IEEE Trans. Speech Audio Process..

[23]  Sergio A. Velastin,et al.  Recognizing Human Actions Using Silhouette-based HMM , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[24]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[25]  Binlong Li,et al.  Activity recognition using dynamic subspace angles , 2011, CVPR 2011.

[26]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[28]  Eduardo Sontag Nonlinear regulation: The piecewise linear approach , 1981 .

[29]  James M. Rehg,et al.  Decoding Children's Social Behavior , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Agata Rozga,et al.  Joint Alignment and Modeling of Correlated Behavior Streams , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[31]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yao-Jen Chang,et al.  A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities. , 2011, Research in developmental disabilities.

[34]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[35]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Manuela M. Veloso,et al.  Conditional random fields for activity recognition , 2007, AAMAS '07.

[37]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[38]  Ze-Nian Li BEYOND ACTIONS : DISCRIMINATIVE MODELS FOR CONTEXTUAL GROUP ACTIVITIES , 2010 .

[39]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[40]  Christian D. Schunn,et al.  Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction , 2002, Proc. IEEE.

[41]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[42]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[44]  Bruno Sinopoli,et al.  Networked Sensing, Estimation and Control Systems , 2009 .

[45]  René Vidal,et al.  Identification of Hybrid Systems: A Tutorial , 2007, Eur. J. Control.

[46]  Greg Mori,et al.  Max-margin hidden conditional random fields for human action recognition , 2009, CVPR.

[47]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[48]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[49]  Emanuele Frontoni,et al.  Customers’ activity recognition in intelligent retail environments , 2013 .

[50]  F BobickAaron,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999 .