Discriminative pronunciation learning for speech recognition for resource scarce languages

In this paper, we describe a method to create speech recognition capability for small vocabularies in resource-scarce languages. By resource-scarce languages, we mean languages that have a small or economically disadvantaged user base which are typically ignored by the commercial world. We use a high-quality well-trained speech recognizer as our baseline to remove the dependence on large audio data for an accurate acoustic model. Using cross-language phoneme mapping, the baseline recognizer effectively recognizes words in our target language. We automate the generation of pronunciations and generate a set of initial pronunciations for each word in the vocabulary. Next, we remove potential conflicts in word recognition by discriminative training.

[1]  Joyojeet Pal,et al.  Speech Recognition for Illiterate Access to Information and Technology , 2006, 2006 International Conference on Information and Communication Technologies and Development.

[2]  Ronald Rosenfeld,et al.  Speech vs. touch-tone: Telephony interfaces for information access by low literate users , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[3]  Nitendra Rajput,et al.  A comparative study of speech and dialed input voice interfaces in rural India , 2009, CHI.

[4]  Etienne Barnard,et al.  Speech Technology for Information Access: a South African Case Study , 2010, AAAI Spring Symposium: Artificial Intelligence for Development.

[5]  Ronald Rosenfeld,et al.  Small-vocabulary speech recognition for resource-scarce languages , 2010, ACM DEV '10.