The deployment of speech technology systems in the developing world is often hampered by the lack of appropriate linguistic resources. A suitable pronunciation dictionary is one such resource that can be difficult to obtain for lesser-resourced languages. We design a process for the development of pronunciation dictionaries in resource-scarce environments, and apply this to the development of pronunciation dictionaries for ten of the official languages of South Africa. We define the semiautomated development and verification process in detail and discuss practicalities, outcomes and lessons learnt. We analyse the accuracy of the developed dictionaries and demonstrate how the distribution of rules generated from the dictionaries provides insight into the inherent predictability of the languages studied. Index Terms: pronunciation dictionaries, dictionary verification, resource-scarce, bootstrapping, Southern Bantu languages.
[1]
Marelie H. Davel,et al.
Error analysis of a public domain pronunciation dictionary
,
2007
.
[2]
Alan W. Black,et al.
Bootstrapping Phonetic Lexicons for New Languages
,
2004
.
[3]
Etienne Barnard,et al.
Bootstrapping in language resource generation
,
2003
.
[4]
Etienne Barnard,et al.
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu Languages
,
2009
.
[5]
Etienne Barnard,et al.
Pronunciation prediction with Default&Refine
,
2008,
Comput. Speech Lang..