Verifying pronunciation dictionaries using conflict analysis

We describe a new language-independent technique for automatically identifying errors in an electronic pronunciation dictionary by analyzing the source of conflicting patterns directly. We evaluate the effectiveness of the technique in two ways: we perform a controlled experiment using artificially corrupted data (allowing us to measure precision and recall exactly); and then apply the technique to a real-world pronunciation dictionary, demonstrating its effectiveness in practice. We also introduce a new freely available pronunciation resource (the RCRL Afrikaans Pronunciation Dictionary), the largest such dictionary that currently exists. Index Terms: pronunciation dictionaries, error detection, quality verification, Default&Refine, grapheme-to-phoneme, g2p