论文信息 - Discriminative training for large vocabulary telephone-based name recognition

Discriminative training for large vocabulary telephone-based name recognition

This paper describes progress on a commercial application of the MECS recognition system to the task of recognizing Japanese family names spoken by customers into the answering machines of a large marketing/human resource company. The task is thus speaker-independent, open vocabulary, and is characterized by large variation in caller speaking styles, telephone types and acoustic environments. Our results show that context-independent hidden Markov models trained discriminatively with the minimum classification error criterion are a practical alternative to context-dependent models based on phonetic decision trees, yielding better performance with a much smaller number of parameters. On this difficult task we have obtained 59% correct family name recognition. A phoneme-based confidence measure enables us to obtain 85% correct name recognition for accepted utterances, at an overall utterance acceptance rate of 15%.

Alain Biem | Shigeru Katagiri | Erik McDermott | Seiichi Tenpaku

[1] Christoph Neukirchen,et al. Confidence measures for HMM-based speech recognition , 1998, ICSLP.

[2] Shigeru Katagiri,et al. Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[3] Biing-Hwang Juang,et al. Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[4] Shigeru Katagiri,et al. Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[6] Hermann Ney,et al. Improvements in beam search , 1994, ICSLP.

[7] Hermann Ney,et al. State tying for context dependent phoneme models , 1997, EUROSPEECH.

[8] Ben P. Milner,et al. Improving accuracy of telephony-based, speaker-independent speech recognition , 1998, ICSLP.

[9] Steve J. Young,et al. State clustering in hidden Markov model-based continuous speech recognition , 1994, Comput. Speech Lang..

[10] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[11] Erik McDermott,et al. Discriminative Training for Speech Recognition , 1997 .

[12] James R. Glass,et al. Telephone-based conversational speech recognition in the JUPITER domain , 1998, ICSLP.