Pronunciation dictionary development in resource-scarce environments

The deployment of speech technology systems in the developing world is often hampered by the lack of appropriate linguistic resources. A suitable pronunciation dictionary is one such resource that can be difficult to obtain for lesser-resourced languages. We design a process for the development of pronunciation dictionaries in resource-scarce environments, and apply this to the development of pronunciation dictionaries for ten of the official languages of South Africa. We define the semiautomated development and verification process in detail and discuss practicalities, outcomes and lessons learnt. We analyse the accuracy of the developed dictionaries and demonstrate how the distribution of rules generated from the dictionaries provides insight into the inherent predictability of the languages studied. Index Terms: pronunciation dictionaries, dictionary verification, resource-scarce, bootstrapping, Southern Bantu languages.