A Spoken Dialogue System for Navigation in Non‐Immersive Virtual Environments

Navigation is the process by which people control their movement in virtual environments and is a corefunctional requirement for all virtual environment (VE) applications. Users require the ability to move, controllingorientation, direction of movement and speed, in order to achieve a particular goal within a VE. Navigation israrely the end point in itself (which is typically interaction with the visual representations of data) but applicationsoften place a high demand on navigation skills, which in turn means that a high level of support for navigationis required from the application. On desktop systems navigation in non‐immersive systems is usually supportedthrough the usual hardware devices of mouse and keyboard. Previous work by the authors shows that many usersexperience frustration when trying to perform even simple navigation tasks — users complain about getting lost,becoming disorientated and finding the interface `difficult to use'. In this paper we report on work in progressin exploiting natural language processing (NLP) technology to support navigation in non‐immersive virtualenvironments. A multi‐modal system has been developed which supports a range of high‐level (spoken) navigationcommands and indications are that spoken dialogue interaction is an effective alternative to mouse and keyboardinteraction for many tasks. We conclude that multi‐modal interaction, combining technologies such as NLP withmouse and keyboard may offer the most effective interaction with VEs and identify a number of areas where furtherwork is necessary.

[1]  Mark T. Maybury,et al.  Planning Multimedia Explanations Using Communicative Acts , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[2]  Alex van Ballegooij,et al.  Navigation by query in virtual worlds , 2001, Web3D '01.

[3]  Kulwinder Kaur Deol Designing Virtual Environments for Usability , 1997, INTERACT.

[4]  Thomas Rist,et al.  The Design of Illustrated Documents as a Planning Task , 1993, AAAI Workshop on Intelligent Multimedia Interfaces.

[5]  Oliviero Stock,et al.  Natural Language and Exploration of an Information Space: The ALFresco Interactive System , 1991, IJCAI.

[6]  A BoltRichard,et al.  Put-that-there , 1980 .

[7]  Bernd Neumann,et al.  NOAS: Ein System zur natürlichsprachlichen Beschreibung zeitveränderlicher Szenen , 1986, Inform. Forsch. Entwickl..

[8]  Naoyuki Okada Integrating vision, motion and language through mind , 2004, Artificial Intelligence Review.

[9]  S. Kosslyn,et al.  Imagery, propositions, and the form of internal representations , 1977, Cognitive Psychology.

[10]  George W. Furnas,et al.  Navigation in Electronic Worlds. , 1997 .

[11]  Paul Dalsgaard,et al.  A frame semantics for an IntelliMedia TourGuide , 1997 .

[12]  Intelligent Multimedia Interfaces, the book is an outgrowth of the AAAI Workshop on Intelligent Multimedia Interfaces, Anaheim, CA, USA, August, 1991 , 1993, Intelligent Multimedia Interfaces.

[13]  J. Cassell,et al.  Communicative humanoids: a computational model of psychosocial dialogue skills , 1996 .

[14]  Wolfgang Wahlster,et al.  Plan-Based Integration of Natural Language and Graphics Generation , 1993, Artif. Intell..

[15]  Gudula Retz-Schmidt,et al.  Methods for the Intentional Description of Image Sequences , 1991, Wissensbasierte Systeme.

[16]  Wolfgang Wahlster,et al.  Readings in Intelligent User Interfaces , 1998 .

[17]  W. Wahister One word says more than a thousand pictures: on the automatic verbalization of the results of image sequence analysis system , 1987 .

[18]  Paul McKevitt,et al.  Integration of Natural Language and Vision Processing , 1996, Springer Netherlands.

[19]  Paul Isaacs,et al.  A user interface for accessing 3D content on the World Wide Web , 1996, CHI '96.

[20]  David L. Waltz,et al.  Understanding Line drawings of Scenes with Shadows , 1975 .

[21]  John R. Anderson,et al.  Intelligent gaze-added interfaces , 2000, CHI.

[22]  Robert J. K. Jacob,et al.  Interacting with eye movements in virtual environments , 2000, CHI.

[23]  Thomas B. Moeslund,et al.  The IntelliMedia WorkBench - An Environment for Building Multimodal Systems , 1998, Cooperative Multimodal Communication.

[24]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[25]  Todd J. Johnsgard Fitt?s law with a virtual reality glove and a mouse: effects of gain , 1994 .

[26]  Desney S. Tan,et al.  Exploring 3D navigation: combining speed-coupled flying with orbiting , 2001, CHI.

[27]  Zenon W. Pylyshyn,et al.  What the Mind’s Eye Tells the Mind’s Brain: A Critique of Mental Imagery , 1973 .

[28]  Fabio Pianesi,et al.  Natural language generation and hypertext access , 1993, Appl. Artif. Intell..

[29]  Ipke Wachsmuth,et al.  Collaborative Research Centre “Situated Artificial Communicators” at the University of Bielefeld, Germany , 2004, Artificial Intelligence Review.

[30]  Gudula Retz-Schmidt,et al.  Recognizing intentions, interactions, and causes of plan failures , 1991, User Modeling and User-Adapted Interaction.

[31]  Thomas B. Moeslund,et al.  CHAMELEON: a general platform for performing intellimedia , 2002 .

[32]  Norman G. Vinson,et al.  Design guidelines for landmarks to support navigation in virtual environments , 1999, CHI '99.

[33]  W. Maab,et al.  Vitra guide: multimodal route descriptions for computer assisted vehicle navigation , 1993 .

[34]  M Billinghurst,et al.  The expert surgical assistant. An intelligent virtual environment with multimodal input. , 1996, Studies in health technology and informatics.

[35]  Nancy Talbert Toward Human-Centered Systems , 1997, IEEE Computer Graphics and Applications.

[36]  Thomas B. Moeslund,et al.  The intellimedia workbench - a generic environment for multimodal systems , 1998, ICSLP.

[37]  Frank E. Ritter,et al.  Speech Interaction Can Support Problem Solving , 1999, INTERACT.

[38]  Wolfgang Wahlster,et al.  One word says more than a thousand pictures , 1989 .

[39]  Thomas Rist,et al.  On the Simultaneous Interpretation of Real World Image Sequences and their Natural Language Description: The System Soccer , 1988, ECAI.

[40]  Derek Partridge A new guide to artificial intelligence , 1991, Ablex series in computational science.

[41]  Alexander H. Waibel,et al.  Multimodal interfaces , 1996, Artificial Intelligence Review.

[42]  Kevin Knight,et al.  Artificial intelligence (2. ed.) , 1991 .

[43]  Doug A. Bowman,et al.  Travel in immersive virtual environments: an evaluation of viewpoint motion control techniques , 1997, Proceedings of IEEE 1997 Annual International Symposium on Virtual Reality.

[44]  Thomas B. Moeslund,et al.  Developing Intelligent MultiMedia applications , 2002 .

[45]  Kulwinder Kaur,et al.  Designing Virtual Environments for Usability , 1997 .

[46]  George W. Furnas,et al.  Navigation in electronic worlds: a CHI 97 workshop , 1997, SGCH.

[47]  Thomas B. Moeslund,et al.  A platform for developing Intelligent MultiMedia applications , 1998 .

[48]  S. Kicha Ganapathy,et al.  A synthetic visual environment with hand gesturing and voice input , 1989, CHI '89.

[49]  Wolfgang Wahlster,et al.  Incremental Natural Language Description of Dynamic Imagery , 1989, Wissensbasierte Systeme.

[50]  Rudy Darken,et al.  A toolset for navigation in virtual environments , 1993, UIST '93.