The Vocal Joystick

The Vocal Joystick is a novel human-computer interface mechanism designed to enable individuals with motor impairments to make use of vocal parameters to control objects on a computer screen (buttons, sliders, etc.) and ultimately electro-mechanical instruments (e.g., robotic arms, wireless home automation devices). We have developed a working prototype of our "VJ-engine" with which individuals can now control computer mouse movement with their voice. The core engine is currently optimized according to a number of criterion. In this paper, we describe the engine system design, engine optimization, and user-interface improvements, and outline some of the signal processing and pattern recognition modules that were successful. Lastly, we present new results comparing the vocal joystick with a state-of-the-art eye tracking pointing device, and show that not only is the Vocal Joystick already competitive, for some tasks it appears to be an improvement

[1]  Xiao Li,et al.  Graphical model approach to pitch tracking , 2004, INTERSPEECH.

[2]  Steven K. Feiner,et al.  Interaction techniques using prosodic features of speech and audio localization , 2005, IUI '05.

[3]  Richard Wright,et al.  The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments , 2005, HLT.

[4]  Xiao Li,et al.  Energy and loudness for speed control in the vocal joystick , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[5]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[6]  Marco Gori,et al.  A voice device with an application-adapted protocol for Microsoft Windows , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[7]  Xiao Li,et al.  A graphical model for formant tracking , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Xiao Li,et al.  Regularized Adaptation of Discriminative Classifiers , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Xiao Li,et al.  Maximum margin learning and adaptation of MLP classifiers , 2005, INTERSPEECH.

[10]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[11]  Takeo Igarashi,et al.  Voice as sound: using non-verbal voice input for interactive control , 2001, UIST '01.