Put that where? voice and gesture at the graphics interface

A person stands in front of a large projection screen on which is shown a checked floor. They say, "Make a table," and a wooden table appears in the middle of the floor."On the table, place a vase," they gesture using a fist relative to palm of their other hand to show the relative location of the vase on the table. A vase appears at the correct location."Next to the table place a chair." A chair appears to the right of the table."Rotate it like this," while rotating their hand causes the chair to turn towards the table."View the scene from this direction," they say while pointing one hand towards the palm of the other. The scene rotates to match their hand orientation.In a matter of moments, a simple scene has been created using natural speech and gesture. The interface of the future? Not at all; Koons, Thorisson and Bolt demonstrated this work in 1992 [23]. Although research such as this has shown the value of combining speech and gesture at the interface, most computer graphics are still being developed with tools no more intuitive than a mouse and keyboard. This need not be the case. Current speech and gesture technologies make multimodal interfaces with combined voice and gesture input easily achievable. There are several commercial versions of continuous dictation software currently available, while tablets and pens are widely supported in graphics applications. However, having this capability doesn't mean that voice and gesture should be added to every modeling package in a haphazard manner. There are numerous issues that must be addressed in order to develop an intuitive interface that uses the strengths of both input modalities.In this article we describe motivations for adding voice and gesture to graphical applications, review previous work showing different ways these modalities may be used and outline some general interface guidelines. Finally, we give an overview of promising areas for future research. Our motivation for writing this is to spur developers to build compelling interfaces that will make speech and gesture as common on the desktop as the keyboard and mouse.

[1]  Sharon L. Oviatt,et al.  Multimodal interfaces for dynamic interactive maps , 1996, CHI.

[2]  David Zeltzer,et al.  A design method for “whole-hand” human-computer interaction , 1993, TOIS.

[3]  Dylan M. Jones,et al.  Design guidelines for speech recognition interfaces. , 1989, Applied ergonomics.

[4]  Ronald M. Baecker,et al.  Readings in human-computer interaction : a multidisciplinary approach , 1988 .

[5]  Gale Martin,et al.  The Utility of Speech Input in User-Computer Interfaces , 1989, Int. J. Man Mach. Stud..

[6]  Alexander G. Hauptmann,et al.  Gestures with Speech for Graphic Manipulation , 1993, Int. J. Man Mach. Stud..

[7]  Richard A. Bolt,et al.  Multi-modal natural dialogue , 1992, CHI '92.

[8]  Jane Wilhelms,et al.  Put: language-based interactive manipulation of objects , 1996, IEEE Computer Graphics and Applications.

[9]  Joëlle Coutaz,et al.  Towards automatic evaluation of multimodal user interfaces , 1993, IUI '93.

[10]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[11]  Sharon L. Oviatt,et al.  User-Centered Modeling for Spoken Language and Multimodal Interfaces , 1996, IEEE Multim..

[12]  Stuart C. Shapiro,et al.  Intelligent multi-media interface technology , 1991 .

[13]  Mark W. Salisbury,et al.  Talk and draw: bundling speech and graphics , 1990, Computer.

[14]  S. Kicha Ganapathy,et al.  A synthetic visual environment with hand gesturing and voice input , 1989, CHI '89.

[15]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[16]  Paul F Kirvan Conversing with computers , 1984 .

[17]  Alex Waibel,et al.  Modeling and Interpreting Multimodal Inputs: A Semantic Integration Approach , 1997 .

[18]  Philip R. Cohen The role of natural language in a multimodal interface , 1992, UIST '92.

[19]  Mark E. Lucente,et al.  Visualization Space: A Testbed for Deviceless Multimodal User Interface , 1998 .

[20]  Michel Beaudouin-Lafon,et al.  Charade: remote control of objects using free-hand gestures , 1993, CACM.

[21]  Joëlle Coutaz,et al.  A generic platform for addressing the multimodal challenge , 1995, CHI '95.

[22]  Elaine Marsh,et al.  Human-Machine Dialogue for Multi-Modal Decision Support Systems , 1994 .

[23]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.