On the Relationships Among Speech, Gestures, and Object Manipulation in Virtual Environments: Initial Evidence

This chapter reports on a study whose goal was to investigate how people make use of gestures and spoken utterances while playing a videogame without the support of standard input devices. We deploy a Wizard of Oz technique to collect audio- video- and body movement-related data on people’s free use of gesture and speech input. Data was collected from ten subjects for up to 60 minutes of game interaction each. We provide information on preferential mode use, as well as the predictability of gesture based on the objects in the scene. The long-term goal of this on-going study is to collect natural and reliable data from different input modalities, which could provide training data for the design and development of a robust multimodal recognizer.

[1]  R. Krauss Why Do We Gesture When We Speak? , 1998 .

[2]  Andrea Corradini Real-Time Gesture Recognition by Means of Hybrid Recognizers , 2001, Gesture Workshop.

[3]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[4]  Ken Hinckley,et al.  Passive real-world interface props for neurosurgical visualization , 1994, CHI '94.

[5]  Randy F. Pausch,et al.  A Literature Survey for Virtual Environments: Military Flight Simulator Visual Systems and Simulator Sickness , 1992, Presence: Teleoperators & Virtual Environments.

[6]  S. Goldin-Meadow,et al.  The role of gesture in communication and thinking , 1999, Trends in Cognitive Sciences.

[7]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[8]  Taku Komura,et al.  Computing inverse kinematics with linear programming , 2005, VRST '05.

[9]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[10]  Eugenia M. Kolasinski,et al.  Simulator Sickness in Virtual Environments. , 1995 .

[11]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[12]  Sharon L. Oviatt,et al.  Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity , 1994, Speech Communication.

[13]  S. Kicha Ganapathy,et al.  A synthetic visual environment with hand gesturing and voice input , 1989, CHI '89.

[14]  Norman I. Badler,et al.  A Parameterized Action Representation for Virtual Human Agents , 1998 .

[15]  Ipke Wachsmuth,et al.  Gesture and Sign Language in Human-Computer Interaction , 1998, Lecture Notes in Computer Science.

[16]  D. McNeill Language and Gesture , 2000 .

[17]  Francis K. H. Quek,et al.  Gesture and Speech Multimodal Conversational Interaction , 2001 .

[18]  Randy Pausch,et al.  Virtual reality on a WIM: interactive worlds in miniature , 1995, CHI '95.

[19]  Ipke Wachsmuth,et al.  Interpretation of Shape-Related Iconic Gestures in Virtual Environments , 2001, Gesture Workshop.

[20]  Alexander G. Hauptmann,et al.  Speech and gestures for graphic image manipulation , 1989, CHI '89.

[21]  Matthew Stone,et al.  Living Hand to Mouth: Psychological Theories about Speech and Gesture in Interactive Dialogue Systems , 1999 .

[22]  Warren Robinett,et al.  Virtual environment display system , 1987, I3D '86.

[23]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[24]  Sharon L. Oviatt,et al.  Multimodal Interaction for 2D and 3D Environments , 1999, IEEE Computer Graphics and Applications.

[25]  Adam Kendon,et al.  Language and Gesture: Language and gesture: unity or duality? , 2000 .

[26]  Russell M. Taylor,et al.  VRPN: a device-independent, network-transparent VR peripheral system , 2001, VRST '01.

[27]  Steven K. Feiner,et al.  Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality , 2003, ICMI '03.

[28]  Justine Cassell,et al.  Temporal classification of natural gesture and application to video coding , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.