Machine learning for modeling Dutch pronunciation variation

This paper describes the use of rule induction techniques for the automatic extraction of phonemic knowledge and rules from pairs of pronunciation lexicons. This extracted knowledge allows the adaptation of speech processing systems to regional variants of a language. As a case study, we apply the approach to Northern Dutch and Flemish (the variant of Dutch spoken in Flanders, a part of Belgium) , based on Celex and Fonilex, pronunciation lexicons for Northern Dutch and Flemish, respectively. In our study, we compare two rule induction techniques, TransformationBased Error-Driven Learning (TBEDL) (Brill, 1995) and C5.0 (Quinlan, 1993), and evaluate the extracted knowledge quantitatively (accuracy) and qualitatively (linguistic relevance of the rules). We conclude that, whereas classication-based rule induction with C5.0 is more accurate, the transformation rules learned with TBEDL can be more easily interpreted.