The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments

We present a novel voice-based human-computer interface designed to enable individuals with motor impairments to use vocal parameters for continuous control tasks. Since discrete spoken commands are ill-suited to such tasks, our interface exploits a large set of continuous acoustic-phonetic parameters like pitch, loudness, vowel quality, etc. Their selection is optimized with respect to automatic recognizability, communication bandwidth, learnability, suitability, and ease of use. Parameters are extracted in real time, transformed via adaptation and acceleration, and converted into continuous control signals. This paper describes the basic engine, prototype applications (in particular, voice-based web browsing and a controlled trajectory-following task), and initial user studies confirming the feasibility of this technology.

[1]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[2]  Alan F. Blackwell,et al.  Dasher—a data entry interface using continuous gestures and language models , 2000, UIST '00.

[3]  James A. Landay,et al.  The integrated communication 2 draw (IC2D): a drawing program for the visually impaired , 1999, CHI EA '99.

[4]  Xiao Li,et al.  Maximum margin learning and adaptation of MLP classifiers , 2005, INTERSPEECH.

[5]  Elizabeth D. Mynatt Designing with auditory icons: how well do we identify auditory cues? , 1994, CHI Conference Companion.

[6]  Shumin Zhai,et al.  An isometric tongue pointing device , 1997, CHI.

[7]  Anind K. Dey,et al.  Web accessibility for low bandwidth input , 2002, Assets '02.

[8]  Marco Gori,et al.  A voice device with an application-adapted protocol for Microsoft Windows , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[9]  Melody Moore Jackson,et al.  Human factors issues in the neural signals direct brain-computer interfaces , 2000, Assets '00.

[10]  Elizabeth D. Mynatt,et al.  Mapping GUIs to auditory interfaces , 1992, UIST '92.

[11]  Ronald Rosenfeld,et al.  Towards every-citizen²s speech interface: an application generator for speech interfaces to databases , 2002, INTERSPEECH.

[12]  Takeo Igarashi,et al.  Voice as sound: using non-verbal voice input for interactive control , 2001, UIST '01.

[13]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[14]  Jeff A. Johnson,et al.  The Xerox Star: a retrospective , 1989, Computer.

[15]  Ronald Rosenfeld,et al.  Towards a universal speech interface , 2000, INTERSPEECH.

[16]  Nestor Garay-Vitoria,et al.  Intelligent word-prediction to enhance text input rate (a syntactic analysis-based word-prediction aid for people with severe motor and speech disability) , 1997, IUI '97.

[17]  Ronald Rosenfeld,et al.  Keywords for a universal speech interface , 2002, CHI Extended Abstracts.

[18]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[19]  Xiao Li,et al.  A graphical model for formant tracking , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[20]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[21]  James L. Alty,et al.  Communicating graphical information to blind users using music: the role of context , 1998, CHI.

[22]  Albert M. Cook,et al.  Assistive Technologies: Principles and Practice , 1995 .

[23]  John L. Arnott,et al.  Augmentative and alternative communication: the role of broadband telecommunications , 1995 .

[24]  Ingrid Wickelgren,et al.  Tapping the Mind , 2003, Science.

[25]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[26]  Xiao Li,et al.  Graphical model approach to pitch tracking , 2004, INTERSPEECH.

[27]  William W. Gaver The SonicFinder: An Interface That Uses Auditory Icons , 1989, Hum. Comput. Interact..

[28]  Alan F. Newell,et al.  Prediction and conversational momentum in an augmentative communication system , 1992, CACM.

[29]  Peter Robinson,et al.  Developing a methodology for the design of accessible interfaces , 1998 .

[30]  P. Ladefoged,et al.  The sounds of the world's languages , 1996 .

[31]  Steven K. Feiner,et al.  Interaction techniques using prosodic features of speech and audio localization , 2005, IUI '05.

[32]  Shari Trewin,et al.  A model of keyboard configuration requirements , 1998, Assets '98.