Applications of Language Modeling in Speech-To-Speech Translation

This paper describes various language modeling issues in a speech-to-speech translation system. These issues are addressed in the IBM speech-to-speech system we developed for the DARPA Babylon program in the context of two-way translation between English and Mandarin Chinese. First, the language models for the speech recognizer had to be adapted to the specific domain to improve the recognition performance for in-domain utterances, while keeping the domain coverage as broad as possible. This involved considerations of disfluencies and lack of punctuation, as well as domain-specific utterances. Second, we used a hybrid semantic/syntactic representation to minimize the data sparseness problem in a statistical natural language generation framework. Serious inflection and synonym issues arise when words in the target language are to be determined in the translation output. Instead of relying on tedious handcrafted grammar rules, we used N-gram models as a post-processing step to enhance the generation performance. When an interpolated language model was applied to a Chinese-to-English translation task, the translation performance, measured by an objective metric of BLEU, improved substantially to 0.514 from 0.318 when we used the correct transcription as input. Similarly, the BLEU score improved to 0.300 from 0.194 for the same task when the input was speech data.

[1]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[2]  Nadia Mana,et al.  Balancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue , 2002, Speech-to-Speech Translation@ACL.

[3]  Michael C. McCord,et al.  A New Version of the Machine Translation System LMT , 1989 .

[4]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[5]  Hermann Ney,et al.  Algorithms for statistical translation of spoken language , 2000, IEEE Trans. Speech Audio Process..

[6]  Michael Picheny,et al.  Statistical natural language generation for speech-to-speech machine translation systems , 2002, INTERSPEECH.

[7]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[8]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[9]  Alon Lavie,et al.  Janus-III: speech-to-speech translation in multiple languages , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[11]  Ea-Ee Jan,et al.  The IBM conversational telephony system for financial applications , 1999, EUROSPEECH.

[12]  Salim Roukos,et al.  Phrase splicing and variable substitution using the IBM trainable speech synthesis system , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13]  Alon Lavie,et al.  A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System , 2002, Speech-to-Speech Translation@ACL.

[14]  Manny Rayner,et al.  The Spoken Language Translator , 2001, Computational Linguistics.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.