论文信息 - Voice as sound: using non-verbal voice input for interactive control

Voice as sound: using non-verbal voice input for interactive control

We describe the use of non-verbal features in voice for direct control of interactive applications. Traditional speech recognition interfaces are based on an indirect, conversational model. First the user gives a direction and then the system performs certain operation. Our goal is to achieve more direct, immediate interaction like using a button or joystick by using lower-level features of voice such as pitch and volume. We are developing several prototype interaction techniques based on this idea, such as "control by continuous voice", "rate-based parameter control by pitch," and "discrete parameter control by tonguing." We have implemented several prototype systems, and they suggest that voice-as-sound techniques can enhance traditional voice recognition approach.

Takeo Igarashi | John F. Hughes | J. Hughes | T. Igarashi

[1] Alexander H. Waibel,et al. Towards spontaneous speech recognition for on-board car navigation and information systems , 1999, EUROSPEECH.

[2] Keikichi Hirose,et al. Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3] Takeo Igarashi,et al. Speed-dependent automatic zooming for browsing large documents , 2000, UIST '00.

[4] Masataka Goto,et al. Speech Completion: New Speech Interface with On-demand Completion Assistance , 2001 .

[5] Johan Bos,et al. Giving prosody a meaning , 1997, EUROSPEECH.

[6] Kazuhiko Ozeki,et al. Effectiveness of prosodic features in syntactic analysis of read Japanese sentences , 2000, INTERSPEECH.

[7] Bill Z. Manaris,et al. An Intelligent Interface for Keyboard and Mouse Control -- Providing Full Access to PC Functionality via Speech , 2001, FLAIRS Conference.

[8] Nigel Ward,et al. Responding to subtle, fleeting changes in the user's internal state , 2001, CHI.