Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains

Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs, these designs have to accommodate varying requirements imposed by the domain needs and characteristics, target device and usage modality (such as phrase-based, or spontaneous free form interactions, with or without visual feedback) and huge spoken language variability arising due to socio-linguistic and cultural differences of the users. This paper, using case studies of creating speech translation systems between English and languages such as Pashto and Farsi, describes some of the practical issues and the solutions that were developed for multilingual ASR development. These include novel acoustic and language modeling strategies such as language adaptive recognition, active-learning based language modeling, class-based language models that can better exploit resource poor language data, efficient search strategies, including N-best and confidence generation to aid multiple hypotheses translation, use of dialog information and clever interface choices to facilitate ASR, and audio interface design for meeting both usability and robustness requirements

[1]  Martin Graciarena,et al.  Robust feature compensation in nonstationary and multiple noise environments , 2005, INTERSPEECH.

[2]  Andreas Stolcke,et al.  Improved maximum mutual information estimation training of continuous density HMMs , 2001, INTERSPEECH.

[3]  Kristin Precoda,et al.  Limited-Domain Speech-to-Speech Translation between English and Pashto , 2004, NAACL.

[4]  Daniel Marcu,et al.  Transonics: a speech to speech system for English-Persian interactions , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[5]  Daniel Marcu,et al.  Transonics: A Practical Speech-to-Speech Translator for English-Farsi Medical Dialogs , 2005, ACL.

[6]  Panayiotis G. Georgiou,et al.  Context dependent statistical augmentation of persian transcripts , 2004, INTERSPEECH.

[7]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[8]  Panayiotis G. Georgiou,et al.  A Transcription Scheme for Languages Employing the Arabic Script Motivated by Speech Processing Applications , 2004 .

[9]  Kristin Precoda,et al.  Speech translation for low-resource languages: the case of Pashto , 2005, INTERSPEECH.

[10]  Panayiotis G. Georgiou,et al.  Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients , 2004, LREC.

[11]  Andreas Stolcke,et al.  DynaSpeak: SRI's scalable speech recognizer for embedded and mobile systems , 2002 .

[12]  Leonardo Neumeyer,et al.  Probabilistic optimum filtering for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Panayiotis G. Georgiou,et al.  Building topic specific language models from webdata using competitive models , 2005, INTERSPEECH.

[14]  Kristin Precoda,et al.  Development of phrase translation systems for handheld computers: from concept to field , 2003, INTERSPEECH.