A map-based system using speech and 3D gestures for pervasive computing

We describe an augmentation of Quickset, a multimodal voice/pen system that allows users to create and control map-based, collaborative, interactive simulations. In this paper, we report on our extension of the graphical pen input mode front stylus/mouse to 3D hand movements. To do this, the map is projected onto a virtual plane in space, specified by the operator before the start of the interactive session. We then use our geometric model to compute the intersection of hand movements with the virtual plane, translating these into map coordinates on the appropriate system. The goal of this research is the creation of a body-centered, multimodal architecture employing both speech and 3D hand gestures, which seamlessly, and unobtrusively supports distributed interaction. The augmented system, built on top of an existing architecture, also provides an improved visualization, management and awareness of a shared understanding. Potential applications of this work include telemedicine, battlefield management and any kind of collaborative decision-making during which users may wish to be mobile.

[1]  Michael C. Mozer,et al.  The Neural Network House: An Environment that Adapts to its Inhabitants , 1998 .

[2]  Jun Rekimoto,et al.  Pick-and-drop: a direct manipulation technique for multiple computer environments , 1997, UIST '97.

[3]  Hector J. Levesque,et al.  The adaptive agent architecture: achieving fault-tolerance using persistent broker teams , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[4]  John A. Vince,et al.  Virtual reality systems , 1995 .

[5]  David R. McGee Exploring Handheld, Agent-based, Multimodal Collaboration , 1998 .

[6]  James W. Davis,et al.  The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment , 1999, Presence.

[7]  Michael H. Coen,et al.  Design Principles for Intelligent Environments , 1998, AAAI/IAAI.

[8]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[9]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[10]  Mark Weiser The computer for the 21st century , 1991 .

[11]  Michael Johnston,et al.  Unification-based Multimodal Parsing , 1998, ACL.

[12]  Russell M. Taylor,et al.  VRPN: a device-independent, network-transparent VR peripheral system , 2001, VRST '01.

[13]  Philip R. Cohen,et al.  Multimodal speech-gesture interface for handfree painting on a virtual paper using partial recurrent neural networks as gesture recognizer , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[14]  Sharon L. Oviatt,et al.  Multimodal interfaces for dynamic interactive maps , 1996, CHI.

[15]  Adam Cheyer,et al.  The Open Agent Architecture , 1997, Autonomous Agents and Multi-Agent Systems.

[16]  Philip R. Cohen,et al.  A visual modality for the augmentation of paper , 2001, PUI '01.

[17]  Michael H. Coen The future of human-computer interaction or how i learned to stop worrying and love my intelligent r , 1999 .

[18]  Chris Baber,et al.  Evaluating automatic speech recognition as a component of a multi-input device human-computer interface , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.