Speech translation for low-resource languages: the case of Pashto

We present a number of challenges and solutions that have arisen in the development of a speech translation system for American English and Pashto, highlighting those specific to a very low resource language. In particular, we address issues posed by Pashto in the areas of written representation, corpus creation, speech recognition, speech synthesis, and grammar development for translation.

[1]  Rita Singh,et al.  TONGUES: rapid development of a speech-to-speech translation system , 2002 .

[2]  Douglas E. Appelt,et al.  GEMINI: A Natural Language System for Spoken-Language Understanding , 1993, ACL.

[3]  Daniel Marcu,et al.  The Transonics Spoken Dialogue Translator: An Aid for English-Persian Doctor-Patient Interviews , 2004, AAAI Technical Report.

[4]  Bowen Zhou,et al.  Two-way speech-to-speech translation on handheld devices , 2004, INTERSPEECH.

[5]  Tanja Schultz,et al.  A Thai Speech Translation System for Medical Dialogs , 2004, NAACL.

[6]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  Taylor Roberts,et al.  Clitics and agreement , 2000 .

[8]  Dilek Z. Hakkani-Tür,et al.  Active learning for automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Imed Zitouni,et al.  Effectiveness of the backoff hierarchical class n-gram language models to model unseen events in speech recognition , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[10]  Andreas Stolcke,et al.  Improved maximum mutual information estimation training of continuous density HMMs , 2001, INTERSPEECH.

[11]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[12]  Kristin Precoda,et al.  Limited-Domain Speech-to-Speech Translation between English and Pashto , 2004, NAACL.