A Multimodal framework for Interacting with Virtual Environments

Although there has been a tremendous progress in recent years in 3-D, immersive display and virtual reality (VR) technologies, the corresponding interface technologies have lagged behind. To fully exploit the potential that VR offers as a means of visualizing and interacting with complex information, it is important to develop “natural” means for interacting with the virtual display. Such natural interaction can be achieved by using an integrated approach where multiple, possibly redundant modes of input such as speech, hand gesture, gaze, and graphical feedback are used simultaneously. This paper presents a conceptual framework for multimodal human-computer interaction for manipulating a virtual object. Specific techniques are presented for using a combination of speech and gesture for manipulating virtual objects. Free hand gestures are analyzed and recognized using computer vision. The gesture analysis is done cooperatively with the speech recognition system and the graphic system. This is demonstrated with the help of an experimental VR setup used by molecular biologists for simulating and visualizing complex molecular structures.

[1]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Edward Hunter,et al.  Vision based hand gesture interpretation using recursive estimation , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[3]  Dean Rubine Integrating Gesture Recognition and Direct Manipulation , 1991, USENIX Summer.

[4]  Christoph Maggioni,et al.  A novel gestural input device for virtual reality , 1993, Proceedings of IEEE Virtual Reality Annual International Symposium.

[5]  Alexander G. Hauptmann,et al.  Gestures with Speech for Graphic Manipulation , 1993, Int. J. Man Mach. Stud..

[6]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[7]  Takeo Kanade,et al.  DigitEyes: Vision-Based Human Hand Tracking , 1993 .

[8]  J. Streeck Gesture as communication I: Its coordination with gaze and speech , 1993 .

[9]  Yunxin Zhao,et al.  An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition , 1994, IEEE Trans. Speech Audio Process..

[10]  Thomas S. Huang,et al.  Vision based hand modeling and tracking for virtual teleconferencing and telecollaboration , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  Michael J. Swain,et al.  Indexing via color histograms , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[12]  Roberto Cipolla,et al.  Robust structure from motion using motion parallax , 1993, 1993 (4th) International Conference on Computer Vision.

[13]  D. McNeill So you think gestures are nonverbal , 1985 .

[14]  A. Lecours,et al.  The Biological foundations of gestures : motor and semiotic aspects , 1986 .

[15]  Yasuhito Suenaga,et al.  "Finger-Pointer": Pointing interface by image processing , 1994, Comput. Graph..

[16]  Tosiyasu L. Kunii,et al.  Model-based analysis of hand posture , 1995, IEEE Computer Graphics and Applications.

[17]  David Zeltzer,et al.  A survey of glove-based input , 1994, IEEE Computer Graphics and Applications.