Based on earlier work [1], we developed [2] a new data-driven approach for building a lexicon with multiple pronunciation variants per word. The method automatically learns stochastic pronunciation rules that are then used to transform a reference pronunciation (e.g. taken from a pronunciation dictionary) into a list of pronunciation variants. The results obtained with the new approach were quite spectacular: the word error rate on TIMIT could be reduced by more than 45% in a closed vocabulary situation. During the development of our system we argued for the need of cross-word rules and exception rules. The latter prohibit rather than generate a pronunciation variant in a particular situation. In this contribution we describe experiments that assess the importance of these two rule types. The results indicate that by ignoring the cross-word rules, about 40% of the benefit of our approach is lost. On the other hand, some reduction of the exception rule importance seems to be quite acceptable. We can now propose a new rule learning system that has become 9 times faster and is yet offering about the same performance. Keywords— lexical modeling; cross-word rules; exception rules; ASR system
[1]
Helmer Strik,et al.
Modeling pronunciation variation for ASR: A survey of the literature
,
1999,
Speech Commun..
[2]
Jean-Pierre Martens,et al.
In search of better pronunciation models for speech recognition
,
1999,
Speech Commun..
[3]
Jean-Pierre Martens,et al.
Context modeling in hybrid segment-based/neural network recognition systems
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[4]
Qian Yang,et al.
Data-driven lexical modeling of pronunciation variations for ASR
,
2000,
INTERSPEECH.