A Software Framework to Create 3D Browser-Based Speech Enabled Applications

The advances in automatic speech recognition have pushed the humancomputer interface researchers to adopt speech as one mean of input data. It is natural to humans, and complements very well other input interfaces. However, integrating an automatic speech recognizer into a complex system (such as a 3D visualization system or a Virtual Reality system) can be a difficult and time consuming task. In this paper we present our approach to the problem, a software framework requiring minimum additional coding from the application developer. The framework combines voice commands with existing interaction code, automating the task of creating a new speech grammar (to be used by the recognizer). A new listener component for the Xj3D was created, which makes transparent to the user the integration between the 3D browser and the recognizer. We believe this is a desirable feature for virtual reality system developers, and also to be used as a rapid prototyping tool when experimenting with speech technology.

[1]  Vladimir Pavlovic,et al.  Speech/Gesture Interface to a Visual-Computing Environment , 2000, IEEE Computer Graphics and Applications.

[2]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[3]  Robert van Liere,et al.  A multimodal virtual reality interface for 3 D interaction with VTK , 2007 .

[4]  Gunther Heidemann,et al.  Multimodal interaction in an augmented reality scenario , 2004, ICMI '04.

[5]  Ben Shneiderman,et al.  A comparison of voice controlled and mouse controlled web browsing , 2000, Assets '00.

[6]  Adam Cheyer,et al.  Spoken language and multimodal applications for electronic realities , 2005, Virtual Reality.

[7]  Frank Althoff,et al.  A generic approach for interfacing VRML browsers to various input devices and creating customizable 3D applications , 2002, Web3D '02.

[8]  Ben Shneiderman,et al.  Speech versus Mouse Commands for Word Processing: An Empirical Evaluation , 1993, Int. J. Man Mach. Stud..

[9]  George N. Phillips,et al.  Modular approach of multimodal integration in a virtual environment , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[10]  Andrea Corradini,et al.  A Robust Spoken Language Architecture to Control a 2D Game , 2007, FLAIRS Conference.

[11]  Randy Pausch,et al.  A Study Comparing Mouse-Only Input for a Graphical Editor , 1990 .

[12]  Shree K. Nayar,et al.  Creating a Speech Enabled Avatar from a Single Photograph , 2008, 2008 IEEE Virtual Reality Conference.

[13]  Sharon L. Oviatt,et al.  Advances in Robust Multimodal Interface Design , 2003, IEEE Computer Graphics and Applications.

[14]  Howard Rheingold,et al.  Virtual Reality , 1991 .

[15]  A BoltRichard,et al.  Put-that-there , 1980 .

[16]  Marc M. Sebrechts,et al.  HANDBOOK OF VIRTUAL ENVIRONMENTS , 2014 .

[17]  G E R D H E R Z O G,et al.  Large-scale software integration for spoken language and multimodal dialog systems , 2004 .

[18]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[19]  Alexander I. Rudnicky,et al.  A Comparison of Speech and Typed Input , 1990, HLT.

[20]  John J. Leggett,et al.  An Empirical Investigation of Voice as an Input Modality for Computer Programming , 1984, Int. J. Man Mach. Stud..

[21]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.