Real-Time Gesture Recognition by Means of Hybrid Recognizers

In recent times, there have been significant efforts to develop intelligent and natural interfaces for interaction between human users and computer systems by means of a variety of modes of information (visual, audio, pen, etc.). These modes can be used either individually or in combination with other modes. One of the most promising interaction modes for these interfaces is the human user's natural gesture.In this work, we apply computer vision techniques to analyze real-time video streams of a user's freehand gestures from a predefined vocabulary. We propose the use of a set of hybrid recognizers where each of them accounts for one single gesture and consists of one hidden Markov model (HMM) whose state emission probabilities are computed by partially recurrent artificial neural networks (ANN).The underlying idea is to take advantage of the strengths of ANNs to capture the nonlinear local dependencies of a gesture, while handling its temporal structure within the HMM formalism. The recognition engine's accuracy outperforms that of HMM- and ANN-based recognizers used individually.

[1]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[2]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[3]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Horst-Michael Groß,et al.  Person Localization and Posture Recognition for Human-Robot Interaction , 1999, Gesture Workshop.

[5]  Alex Pentland,et al.  Facial expression recognition using a dynamic model and motion energy , 1995, Proceedings of IEEE International Conference on Computer Vision.

[6]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[7]  Horst-Michael Groß,et al.  Implementation and comparison of three architectures for gesture recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Hervé Bourlard,et al.  Hybrid Connectionist Models For Continuous Speech Recognition , 1996 .

[9]  Yoshua Bengio A Connectionist Approach to Speech Recognition , 1993, Int. J. Pattern Recognit. Artif. Intell..

[10]  J. S. Urban Hjorth,et al.  Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap , 1993 .

[11]  S. T. Buckland,et al.  Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap , 1993 .

[12]  Markus Schenkel Handwriting recognition using neural networks and hidden Markov models , 1995 .

[13]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[15]  Dean Rubine,et al.  Specifying gestures by example , 1991, SIGGRAPH.

[16]  Sebastian Thrun,et al.  A Gesture Based Interface for Human-Robot Interaction , 2000, Auton. Robots.

[17]  Philip R. Cohen,et al.  Creating tangible interfaces by augmenting physical objects with multimodal language , 2001, IUI '01.

[18]  Horst-Michael Groß,et al.  A Hybrid Stochastic-Connectionist Approach to Gesture Recognition , 2000, Int. J. Artif. Intell. Tools.

[19]  Marc Erich Latoschik,et al.  Temporal Symbolic Integration Applied to a Multimodal System Using Gestures and Speech , 1999, Gesture Workshop.

[20]  James L. Alty,et al.  Investigating the Role of Redundancy in Multimodal Input Systems , 1997, Gesture Workshop.

[21]  A. Kundu,et al.  Recognition of handwritten script: a hidden Markov model based approach , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[22]  Carl F. R. Weiman,et al.  Helpmate autonomous mobile robot nav-igation system , 1991 .

[23]  A. Stove Non-Emblematic Gestures for Estimating Mood , 1996, Gesture Workshop.

[24]  A. Corradini,et al.  A hybrid stochastic-connectionist architecture for gesture recognition , 1999, Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No.PR00446).

[25]  Ying Wu,et al.  Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[26]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[27]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[28]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[29]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Joseph J. LaViola A Multimodal Interface Framework for Using Hand Gestures and Speech in Virtual Environment Applications , 1999, Gesture Workshop.

[31]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[32]  James K. Baker,et al.  Stochastic modeling for automatic speech understanding , 1990 .

[33]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[34]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .