City name recognition over the telephone

The authors present a neural-network-based speech recognition system for telephone speech. A neural network classifier provides phoneme probabilities for each frame of the utterance. A dynamic programming algorithm finds the most probable sequence of words. The classifier was trained on a spoken name corpus which contained the test vocabulary and many other words. The test set consisted of 262 utterances containing 44 cities and 2 states. The best result obtained on the test set was 92.9% word accuracy (90.1% on just the city names). Removing phoneme duration constraints reduced recognition accuracy to 82%. Performance fell to 82.4% using a network trained on a large vocabulary, fluent-speech corpus. Several other experiments are reported which did not produce significant changes in system performance.<<ETX>>

[1]  Ronald A. Cole,et al.  Speaker-independent phonetic classification in continuous English letters , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[2]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[3]  Ronald A. Cole,et al.  A telephone speech database of spelled and spoken names , 1992, ICSLP.

[4]  Hong C. Leung,et al.  Speech recognition using stochastic segment neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Victor Zue,et al.  Toward vocabulary-independent recognition of telephone speech , 1991, EUROSPEECH.

[6]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Ronald A. Cole,et al.  English alphabet recognition with telephone speech , 1991, EUROSPEECH.

[8]  Ronald A. Cole,et al.  Speaker-independent recognition of spoken English letters , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[9]  Kristin Precoda,et al.  Flexible vocabulary recognition of speech , 1992, ICSLP.