论文信息 - Pronunciation modeling for ASR - knowledge-based and data-derived methods

Pronunciation modeling for ASR - knowledge-based and data-derived methods

This paper focuses on modeling pronunciation variation in two different ways: data-derived and knowledge-based. The knowledge-based approach consists of using phonological rules to generate variants. The data-derived approach consists of performing phone recognition, followed by smoothing using decision trees (D-trees) to alleviate some of the errors in the phone recognition. Using phonological rules led to a small improvement in WER; a data-derived approach in which the phone recognition was smoothed using D-trees prior to lexicon generation led to larger improvements compared to the baseline. The lexicon was employed in two different recognition systems: a hybrid HMM/ANN system and a HMM-based system, to ascertain whether pronunciation variation was truly being modeled. This proved to be the case as no significant differences were found between the results obtained with the two systems. A comparison between the knowledge-based and data-derived methods showed that 17% of variants generated by the phonological rules were also found using phone recognition, and this increases to 46% when the phone recognition output is smoothed by using D-trees.

Mirjam Wester | M. Wester

[1] Don McAllaster,et al. Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch , 1998, ICSLP.

[2] Luis A. Hernández Gómez,et al. Automatic alternative transcription generation and vocabulary selection for flexible word recognizers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Helmer Strik,et al. A data-driven method for modeling pronunciation variation , 2003, Speech Commun..

[4] Filipp Korkmazskiy,et al. Joint pronunciation modelling of non-native speakers using data-driven methods , 2000, INTERSPEECH.

[5] Helmer Strik,et al. Pronunciation variation in ASR: which variation to model? , 2000, INTERSPEECH.

[6] Steve Renals,et al. Confidence Measures for Evaluating Pronunciation Models , 1998 .

[7] Lori Lamel,et al. On designing pronunciation lexicons for large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8] Kuldip K. Paliwal,et al. Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[9] Gunnar Lehtinen,et al. Modeling Pronunciation Variations and Coarticulation with Finite-state Transducers in Csr , 1998 .

[10] S. Quazza,et al. The use of lexica in text-to-speech systems , 2000 .

[11] Jean-Pierre Martens,et al. In search of better pronunciation models for speech recognition , 1999, Speech Commun..