论文信息 - Augmenting user interfaces with adaptive speech commands

Augmenting user interfaces with adaptive speech commands

We present a system that augments any unmodified Java application with an adaptive speech interface. The augmented system learns to associate spoken words and utterances with interface actions such as button clicks. Speech learning is constantly active and searches for correlations between what the user says and does. Training the interface is seamlessly integrated with using the interface. As the user performs normal actions, she may optionally verbally describe what she is doing. By using a phoneme recognizer, the interface is able to quickly learn new speech commands. Speech commands are chosen by the user and can be recognized robustly due to accurate phonetic modelling of the user's utterances and the small size of the vocabulary learned for a single application. After only a few examples, speech commands can replace mouse clicks. In effect, selected interface functions migrate from keyboard and mouse to speech. We demonstrate the usefulness of this approach by augmenting jfig, a drawing application, where speech commands save the user from the distraction of having to use a tool palette.

Deb Roy | Peter Gorniak | Peter Gorniak | D. Roy

[1] Deb Roy,et al. A visually grounded natural language interface for reference to spatial scenes , 2003, ICMI '03.

[2] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3] Raj Reddy,et al. Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[4] Michael J. Carey,et al. Statistical models for topic identification using phoneme substrings , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5] Anthony J. Robinson,et al. An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[6] Stephen A. Dyer,et al. Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[7] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[8] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[9] Sharon L. Oviatt,et al. Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..