论文信息 - Exploiting speech/gesture co-occurrence for improving continuous gesture recognition in weather narration

Exploiting speech/gesture co-occurrence for improving continuous gesture recognition in weather narration

In order to incorporate naturalness in the design of human computer interfaces (HCI), it is desirable to develop recognition techniques capable of handling continuous natural gesture and speech inputs. Though many different researchers have reported high recognition rates for gesture recognition using hidden Markov models (HMM), the gestures used are mostly pre-defined and are bound with syntactical and grammatical constraints. But natural gestures do not string together in syntactical bindings. Moreover, strict classification of natural gestures is not feasible. We have examined hand gestures made in a very natural domain, that of a weather person narrating in front of a weather map. The gestures made by the weather person are embedded in a narration. This provides us with abundant data from an uncontrolled environment to study the interaction between speech and gesture in the context of a display. We hypothesize that this domain is very similar to that of a natural human-computer interface. We present an HMM architecture for continuous gesture recognition framework and keyword spotting. To explore the relation between gesture and speech, we conducted a statistical co-occurrence analysis of different gestures with a selected set of spoken keywords. We then demonstrate how this co-occurrence analysis can be exploited to improve the performance of continuous gesture recognition.

[1] Richard Rose,et al. A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2] Francis K. H. Quek,et al. Toward a vision-based hand gesture interface , 1994 .

[3] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4] Thad Starner,et al. Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[5] Vladimir Pavlovic,et al. Speech/gesture interface to a visual computing environment for molecular biologists , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[6] Sharon L. Oviatt,et al. Multimodal interfaces for dynamic interactive maps , 1996, CHI.

[7] Steve Young,et al. The HTK book , 1995 .

[8] Vladimir Pavlovic,et al. Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Vladimir Pavlovic,et al. Toward multimodal human-computer interface , 1998, Proc. IEEE.