Cross-lingual dialog model for speech to speech translation

Speech understanding through concept classification offers a possible way of machine translation in speech-to-speech translation systems and can be used in conjunction with conventional statistical machine translation. While correct concept classification offers the promise of obtaining well-formed target language speech output, the approach does not scale well to large number of concepts. Importantly, it is also critical to know when to accept or reject the classifier. We formulate the speech classification as a MAP estimation problem to derive the understanding model and improve its performance by incorporating dialog context information. Specifically, for a twoway speech translation system, a classification scheme is derived here that utilizes context information from both sides of the conversation through an n-gram dialog model. The method was evaluated using data from an English-Farsi trans-lingual doctorpatient dialog system and its classification and rejection accuracies were compared to those of a baseline system with an understanding model only. The benefit of incorporating context with the proposed dialog model provided a modest improvement in classification accuracy (about 5% relative error reduction) and a significant improvement in the rejection accuracy (up to 31.4% relative reduction in error).

[1]  Alex Acero,et al.  Speech utterance classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Daniel Marcu,et al.  Transonics: a speech to speech system for English-Persian interactions , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3]  Panayiotis G. Georgiou,et al.  Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients , 2004, LREC.

[4]  Daniel Marcu,et al.  Transonics: A Practical Speech-to-Speech Translator for English-Farsi Medical Dialogs , 2005, ACL.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Panayiotis G. Georgiou,et al.  Building topic specific language models from webdata using competitive models , 2005, INTERSPEECH.

[7]  Tanja Schultz,et al.  Speechalator: two-way speech-to-speech translation on a consumer PDA , 2003, INTERSPEECH.

[8]  Bowen Zhou,et al.  Two-way speech-to-speech translation on handheld devices , 2004, INTERSPEECH.

[9]  Shrikanth S. Narayanan,et al.  Language-adaptive persian speech recognition , 2003, INTERSPEECH.

[10]  Hermann Ney,et al.  Algorithms for statistical translation of spoken language , 2000, IEEE Trans. Speech Audio Process..

[11]  Daniel Marcu,et al.  The Transonics Spoken Dialogue Translator: An Aid for English-Persian Doctor-Patient Interviews , 2004, AAAI Technical Report.

[12]  Shrikanth S. Narayanan,et al.  Adaptive categorical understanding for spoken dialogue systems , 2005, IEEE Transactions on Speech and Audio Processing.