Error analysis of a public domain pronunciation dictionary

We explore pattern recognition techniques for verifying the correctness of a pronunciation lexicon, focusing on techniques that require limited human interaction. We evaluate the British English Example Pronunciation (BEEP) dictionary [1], a popular public domain resource that is widely used in English speech processing systems. The techniques being investigated are applied to the lexicon and the results of each step are illustrated using sample entries. We find that as many as 5553 words in the BEEP dictionary are incorrect. We demonstrate the effect of correction techniques on a lexicon and implement the lexicon in an automatic speech recognition (ASR) system.