论文信息 - PRONUNCIATION MODELING FOR SPONTANEOUS SPEECH BY MAXIMIZING WORD CORRECT RATE IN A PRODUCTION- RECOGNITION MODEL

PRONUNCIATION MODELING FOR SPONTANEOUS SPEECH BY MAXIMIZING WORD CORRECT RATE IN A PRODUCTION- RECOGNITION MODEL

In this paper, we develop a new method for compiling a pronunciation dictionary to model pronunciation variation in spontaneous speech recognition. The pronunciation dictionary is assembled by iteratively selecting pronunciations from a datadriven word confusion table, based on directly maximizing the word correct rate simulated by a production-recognition model such that the optimal performance of recognition can be achieved. In other words, the compiled pronunciation dictionary can not only accommodate as many as necessary pronunciations but also avoid possible introduced confusion during recognition. The simulation of word correct rate is performed with a novel human-machine communication model, consisting of a human speech production module and a machine speech recognition module. Our experimental results on LDC Mandarin Call Home and Call Friend corpora showed that significant improvement is achieved with this new approach. Furthermore, the framework and theory presented here are applicable to other languages.

Lin-Shan Lee | M.-Y. Tsai

[1] W. Levelt. Inaugural Article: Spoken word production: A theory of lexical access , 2001 .

[2] Eric Fosler-Lussier,et al. A comparison of data-derived and knowledge-based modeling of pronunciation variation , 2000, INTERSPEECH.

[3] William J. Byrne,et al. Stochastic pronunciation modelling from hand-labelled phonetic corpora , 1999, Speech Commun..

[4] Torbjørn Svendsen,et al. Maximum likelihood modelling of pronunciation variation , 1999, Speech Commun..

[5] Harriet J. Nock,et al. Detecting and correcting poor pronunciations for multiword units , 1998 .

[6] Wayne H. Ward,et al. Lexical tuning based on triphone confidence estimation , 1997, EUROSPEECH.

[7] Maxine Eskénazi,et al. Automatic generation of context-dependent pronunciations , 1997, EUROSPEECH.

[8] Hauke Schramm,et al. DISCRIMINATIVE OPTIMIZATION OF THE LEXICAL MODEL , 2002 .

[9] L.F.M. ten Bosch,et al. Pronunciation modeling and lexical adaptation using small training sets , 2002 .

[10] Lin-Shan Lee,et al. IMPROVED PRONUNCIATION MODELING BY PROPERLY INTEGRATING BETTER APPROACHES FOR BASEFORM GENERATION , RANKING AND PRUNING , 2000 .

[11] E. Fosler-Lussier,et al. ON THE ROAD TO IMPROVED LEXICAL CONFUSABILITY METRICS , 2000 .

[12] Gethin Williams. Knowing What You Don't Know: Roles for Confidence Measures in Automatic Speech Recognition , 1999 .

[13] 中国社会科学院語言研究所. 当代语言学 = Contemporary linguistics , 1998 .