A data-driven method for modeling pronunciation variation

This paper describes a rule-based data-driven (DD) method to model pronunciation variation in automatic speech recognition (ASR). The DD method consists of the following steps. First, the possible pronunciation variants are generated by making each phone in the canonical transcription of the word optional. Next, forced recognition is performed in order to determine which variant best matches the acoustic signal. Finally, the rules are derived by aligning the best matching variant with the canonical transcription of the variant. Error analysis is performed in order to gain insight into the process of pronunciation modeling. This analysis shows that although modeling pronunciation variation brings about improvements, deteriorations are also introduced. A strong correlation is found between the number of improvements and deteriorations per rule. This result indicates that it is not possible to improve ASR performance by excluding the rules that cause deteriorations, because these rules also produce a considerable number of improvements. Finally, we compare three different criteria for rule selection. This comparison indicates that the absolute frequency of rule application (Fabs) is the most suitable criterion for rule selection. For the best testing condition, a statistically significant reduction in word error rate (WER) of 1.4% absolutely, or 8% relatively, is found.

[1]  Helmer Strik,et al.  Pronunciation variation in ASR: which variation to model? , 2000, INTERSPEECH.

[2]  Gethin Williams,et al.  Knowing What You Don't Know: Roles for Confidence Measures in Automatic Speech Recognition , 1999 .

[3]  Hermann Ney,et al.  The Philips research system for large-vocabulary continuous-speech recognition , 1993, EUROSPEECH.

[4]  Florian Schiel,et al.  Statistical Modelling Of Pronunciation: It's Not The Model, It's The Data , 1998 .

[5]  H. Strik Pronunciation adaptation at the lexical level , 2001 .

[6]  Qian Yang,et al.  Data-driven lexical modeling of pronunciation variations for ASR , 2000, INTERSPEECH.

[7]  Yoshinori Sagisaka,et al.  Automatic generation of multiple pronunciations based on neural networks , 1999, Speech Commun..

[8]  Torbjørn Svendsen,et al.  Maximum likelihood modelling of pronunciation variation , 1999, Speech Commun..

[9]  M Wester,et al.  Obtaining Phonetic Transcriptions: A Comparison between Expert Listeners and a Continuous Speech Recognizer , 2001, Language and speech.

[10]  Jean-Pierre Martens,et al.  In search of better pronunciation models for speech recognition , 1999, Speech Commun..

[11]  Johannes Martens,et al.  On the importance of exception and cross-word rules for the data-driven creation of lexica for ASR , 2000 .

[12]  Filipp Korkmazskiy,et al.  Joint pronunciation modelling of non-native speakers using data-driven methods , 2000, INTERSPEECH.

[13]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[14]  Helmer Strik,et al.  Improving the performance of a Dutch CSR by modeling pronunciation variation , 1998 .

[15]  H. Strik,et al.  Pronunciation Adaption at the Lexical Level , 2001 .

[16]  G. Booij The Phonology of Dutch , 1995 .

[17]  Lou Boves,et al.  A spoken dialog system for the Dutch public transport information service , 1997, Int. J. Speech Technol..

[18]  William J. Byrne,et al.  Stochastic pronunciation modelling from hand-labelled phonetic corpora , 1999, Speech Commun..

[19]  Steve Renals,et al.  Confidence Measures for Evaluating Pronunciation Models , 1998 .

[20]  Gunnar Lehtinen,et al.  GENERATION AND SELECTION OF PRONUNCIATION VARIANTS FOR A FLEXIBLE WORD RECOGNIZER , 1998 .

[21]  Helmer Strik,et al.  Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation , 1999, Speech Commun..

[22]  Maxine Eskénazi,et al.  Automatic generation of context-dependent pronunciations , 1997, EUROSPEECH.

[23]  Eric Fosler-Lussier,et al.  A comparison of data-derived and knowledge-based modeling of pronunciation variation , 2000, INTERSPEECH.

[24]  Nelson Morgan,et al.  Dynamic pronunciation models for automatic speech recognition , 1999 .

[25]  Helmer Strik,et al.  Automatic detection and verification of Dutch phonological rules , 2000 .

[26]  T. Rietveld,et al.  Prosody in NIROS with FONPARS and ALFEIOS , 1994 .