Nonlinear PHMMs for the interpretation of parameterized gesture

Recently we modified the hidden Markov model (HMM) framework to incorporate a global parametric variation in the output probabilities of the states of the HMM. Development of the parametric hidden Markov model (PHMM) was motivated by the task of simultaneously recognizing and interpreting gestures that exhibit meaningful variation. With standard HMMs, such global variation confounds the recognition process. The original PHMM approach assumes a linear dependence of output density means on the global parameter. In this paper we extend the PHMM to handle arbitrary smooth (nonlinear) dependencies. We show a generalized expectation-maximization (GEM) algorithm for training the PHMM and a GEM algorithm to simultaneously recognize the gesture and estimate the value of the parameter. We present results on a pointing gesture, where the nonlinear approach permits the natural azimuth/elevation parameterization of pointing direction.

[1]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[2]  Edward Hunter,et al.  Vision based hand gesture interpretation using recursive estimation , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[3]  Aaron F. Bobick,et al.  Recognition and interpretation of parametric gesture , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[4]  Christopher M. Bishop,et al.  EM Optimization of Latent-Variable Density Models , 1995, NIPS 1995.

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[7]  R. E. Kahn,et al.  Understanding people pointing: the Perseus system , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  Stephen M. Omohundro Family Discovery , 1995, NIPS.

[10]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[11]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[12]  Justine Cassell,et al.  Temporal classification of natural gesture and application to video coding , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Aaron F. Bobick,et al.  Learning visual behavior for gesture analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.