Parametric Hidden Markov Models for Gesture Recognition

A method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the two-dimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the HMM states. Using a linear model of dependence, we formulate an expectation-maximization (EM) method for training the parametric HMM. During testing, a similar EM algorithm simultaneously maximizes the output likelihood of the PHMM for the given sequence and estimates the quantifying parameters. Using visually derived and directly measured three-dimensional hand position measurements as input, we present results that demonstrate the recognition superiority of the PHMM over standard HMM techniques, as well as greater robustness in parameter estimation with respect to noise in the input features. Finally, we extend the PHMM to handle arbitrary smooth (nonlinear) dependencies. The nonlinear formulation requires the use of a generalized expectation-maximization (GEM) algorithm for both training and the simultaneous recognition of the gesture and estimation of the value of the parameter. We present results on a pointing gesture, where the nonlinear approach permits the natural spherical coordinate parameterization of pointing direction.

[1]  Motion analysis of grammatical processes in a visual-gestural language (abstract only) , 1984, COMG.

[2]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[5]  D. McNeill,et al.  Gesture and the Poetics of Prose , 1991 .

[6]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[9]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[10]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[11]  Edward Hunter,et al.  Vision based hand gesture interpretation using recursive estimation , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[12]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[13]  Aaron F. Bobick,et al.  Learning visual behavior for gesture analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[14]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[15]  Stephen M. Omohundro Family Discovery , 1995, NIPS.

[16]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[17]  R. E. Kahn,et al.  Understanding people pointing: the Perseus system , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[18]  Christopher M. Bishop,et al.  EM Optimization of Latent-Variable Density Models , 1995, NIPS 1995.

[19]  Alex Pentland,et al.  Real-time self-calibrating stereo person tracking using 3-D shape estimation from blob features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[20]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[21]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[22]  James W. Davis,et al.  An appearance-based representation of action , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[23]  Trevor Darrell,et al.  A novel environment for situated vision and behavior , 1994 .

[24]  Joshua B. Tenenbaum,et al.  Separating Style and Content , 1996, NIPS.

[25]  Justine Cassell,et al.  Temporal classification of natural gesture and application to video coding , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[28]  Aaron F. Bobick,et al.  Recognition and interpretation of parametric gesture , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[29]  Aaron F. Bobick,et al.  Nonlinear PHMMs for the interpretation of parameterized gesture , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[30]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[31]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.