The generation of letter-to-sound rules for grapheme-to-phoneme conversion

This paper presents an approach to letter-to-sound translation for the Polish language that is a part of a speech recognition system. It describes the process of automatic generation of Polish letter-to-sound (LTS) rules. The LTS rules were trained with a Polish phonetic lexicon, that was extracted from the “wictionary” - a Polish on-line dictionary. This lexicon contains 35.826 entries. We examined a novel method for creating the letter-to-phone allowable pairing, that applies the “IBM Model 1 algorithm. Such automatically generated allowed letter-to-sound pairs were compared with a second pairing map, created by an expert. Both allowable pairing maps were used separately to train the Polish LTS rules. The test results verify that our generated pairing map leads to a more compact LTS model than the expert-made one.