Concept-Based Speech-to-Speech Translation Using Maximum Entropy Models for Statistical Natural Concept Generation

The IBM Multilingual Automatic Speech-To-Speech TranslatOR (MASTOR) system is a research prototype developed for the Defense Advanced Research Projects Agency (DARPA) Babylon/CAST speech-to-speech machine translation program. The system consists of cascaded components of large-vocabulary conversational spontaneous speech recognition, statistical machine translation, and concatenative text-to-speech synthesis. To achieve highly accurate and robust conversational spoken language translation, a unique concept-based speech-to-speech translation approach is proposed that performs the translation by first understanding the meaning of the automatically recognized text. A decision-tree based statistical natural language understanding algorithm extracts the semantic information from the input sentences, while a natural language generation (NLG) algorithm predicts the translated text via maximum-entropy-based statistical models. One critical component in our statistical NLG approach is natural concept generation (NCG). The goal of NCG is not only to generate the correct set of concepts in the target language, but also to produce them in an appropriate order. To improve maximum-entropy-based concept generation, a set of new approaches is proposed. One approach improves concept sequence generation in the target language via forward–backward modeling, which selects the hypothesis with the highest combined conditional probability based on both the forward and backward generation models. This paradigm allows the exploration of both the left and right context information in the source and target languages during concept generation. Another approach selects bilingual features that enable maximum-entropy-based model training on the preannotated parallel corpora. This feature is augmented with word-level information in order to achieve higher NCG accuracy while minimizing the total number of distinct concepts and, hence, greatly reducing the concept annotation and natural language understanding effort. These features are further expanded to multiple sets to enhance model robustness. Finally, a confidence threshold is introduced to alleviate data sparseness problems in our training corpora. Experiments show a dramatic concept generation error rate reduction of more than 40% in our speech translation corpus within limited domains. Significant improvements of both word error rate and BiLingual Evaluation Understudy (BLEU) score are also achieved in our experiments on speech-to-speech translation.

[1]  Ea-Ee Jan,et al.  The IBM conversational telephony system for financial applications , 1999, EUROSPEECH.

[2]  Srinivas Bangalore,et al.  Stochastic Finite-State Models for Spoken Language Machine Translation , 2000, Machine Translation.

[3]  Jason Baldridge,et al.  Verbmobil: Foundations of Speech-to-Speech Translation, by Wolfgang Wahlster (editor). Springer. 2000. ISBN 3-540-67783-6. Price £44.50 (hardback). xii+679 pages , 2004, Natural Language Engineering.

[4]  Hermann Ney,et al.  The statistical approach to machine translation and a roadmap for speech translation , 2003, INTERSPEECH.

[5]  Michael Picheny,et al.  MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic tRanslation System , 2002, Machine Translation.

[6]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[7]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[8]  Christian Boitet,et al.  Speech translation for French within the c-STAR II consortium and future perspectives , 2000, INTERSPEECH.

[9]  Alon Lavie,et al.  The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains , 2004, Machine Translation.

[10]  Salim Roukos,et al.  Phrase splicing and variable substitution using the IBM trainable speech synthesis system , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  M. Picheny,et al.  New Adaptation Techniques for Large Vocabulary Continuous Speech Recognition , 2003 .

[12]  Michael Picheny,et al.  Use of statistical N-gram models in natural language generation for machine translation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Francisco Casacuberta,et al.  The EuTrans Spoken Language Translation System , 2004, Machine Translation.

[14]  Seiichi Yamamoto Toward speech communications beyond language barrier - research of spoken language translation technologies at ATR - , 2000, INTERSPEECH.

[15]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[16]  Hermann Ney,et al.  Some approaches to statistical and finite-state speech-to-speech translation , 2004, Comput. Speech Lang..

[17]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[18]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[19]  Srinivas Bangalore,et al.  Head-Transducer Models for Speech Translation and Their Automatic Acquisition from Bilingual Data , 2004, Machine Translation.

[20]  Hermann Ney,et al.  Algorithms for statistical translation of spoken language , 2000, IEEE Trans. Speech Audio Process..

[21]  Scott Axelrod Natural Language Generation in the IBM Flight Information System , 2000 .

[22]  Alon Lavie,et al.  Janus-III: speech-to-speech translation in multiple languages , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[24]  Francisco Casacuberta,et al.  A new approach to speech-input statistical translation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[25]  Gianni Lazzari Spoken translation: challenges and opportunities , 2000, INTERSPEECH.

[26]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .

[27]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[28]  Sergei Nirenburg,et al.  The Correct Place of Lexical Semantics in Interlingual MT , 1994, COLING.

[29]  Michael Picheny,et al.  New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[30]  Hermann Ney,et al.  Statistical Translation of Text and Speech: First Results with the RWTH System , 2000, Machine Translation.

[31]  Manny Rayner,et al.  The Spoken Language Translator , 2001, Computational Linguistics.

[32]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[33]  Michael Elhadad,et al.  Controlling Content Realization with Functional Unification Grammars , 1992, NLG.