On the importance of exception and cross-word rules for the data-driven creation of lexica for ASR

Based on earlier work [1], we developed [2] a new data-driven approach for building a lexicon with multiple pronunciation variants per word. The method automatically learns stochastic pronunciation rules that are then used to transform a reference pronunciation (e.g. taken from a pronunciation dictionary) into a list of pronunciation variants. The results obtained with the new approach were quite spectacular: the word error rate on TIMIT could be reduced by more than 45% in a closed vocabulary situation. During the development of our system we argued for the need of cross-word rules and exception rules. The latter prohibit rather than generate a pronunciation variant in a particular situation. In this contribution we describe experiments that assess the importance of these two rule types. The results indicate that by ignoring the cross-word rules, about 40% of the benefit of our approach is lost. On the other hand, some reduction of the exception rule importance seems to be quite acceptable. We can now propose a new rule learning system that has become 9 times faster and is yet offering about the same performance. Keywords— lexical modeling; cross-word rules; exception rules; ASR system

[1]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[2]  Jean-Pierre Martens,et al.  In search of better pronunciation models for speech recognition , 1999, Speech Commun..

[3]  Jean-Pierre Martens,et al.  Context modeling in hybrid segment-based/neural network recognition systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Qian Yang,et al.  Data-driven lexical modeling of pronunciation variations for ASR , 2000, INTERSPEECH.