A multimodal, multilingual telephone application: the wildfire electronic assistant

This paper describes how a telephone-based application can perform a variety of tasks in a completely hands-free mode. The overall architecture of the speech component is multimodal in that each mode is tailored to a specific need of the interface. The various modes are described as well as the underlying core technology. To illustrate the effectiveness of the implementation, we present experimental results in American English, UK English and French on a variety of benchmarks, including live data collected during actual use of the system.

[1]  John J. Godfrey,et al.  Macrophone: an American English telephone speech corpus for the Polyphone project , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Shigeru Katagiri,et al.  Minimum error training for speech recognition , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[3]  Bart Kosko,et al.  Neural networks for signal processing , 1992 .

[4]  Olivier Siohan,et al.  On the robustness of linear discriminant analysis as a preprocessing step for noisy speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.