Speech/Gesture Interface to a Visual-Computing Environment

We developed a speech/gesture interface that uses visual hand-gesture analysis and speech recognition to control a 3D display in VMD, a virtual environment for structural biology. The reason we used a particular virtual environment context was to set the necessary constraints to make our analysis robust and to develop a command language that optimally combines speech and gesture inputs. Our interface uses: automatic speech recognition (ASR), aided by a microphone, to recognize voice commands; two strategically positioned cameras to detect hand gestures; and automatic gesture recognition (AGR), a set of computer vision techniques to interpret those hand gestures. The computer vision algorithms can extract the user's hand from the background, detect different finger positions, and distinguish meaningful gestures from unintentional hand movements. Our main goal was to simplify model manipulation and rendering to make biomolecular modeling more playful. Researchers can explore variations of their model and concentrate on biomolecular aspects of their task without undue distraction by computational aspects. They can view simulations of molecular dynamics, play with different combinations of molecular structures, and better understand the molecules' important properties. A potential benefit, for example, might be reducing the time to discover new compounds for new drugs.

[1]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[2]  Takeo Kanade,et al.  DigitEyes: Vision-Based Human Hand Tracking , 1993 .

[3]  Alexander G. Hauptmann,et al.  Gestures with Speech for Graphic Manipulation , 1993, Int. J. Man Mach. Stud..

[4]  Takeo Kanade,et al.  DigitEyes: vision-based hand tracking for human-computer interaction , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[5]  Laxmikant V. Kale,et al.  MDScope - a visual computing environment for structural biology , 1995 .

[6]  Alex Pentland,et al.  ALIVE: Artificial Life Interactive Video Environment , 1994, AAAI.

[7]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[8]  Jian Wang,et al.  Integration of eye-gaze, voice and manual response in multimodal user interface , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[9]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[10]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..