Robust Translation of Spontaneous Speech: A Multi-Engine Approach

Verbmobil is a speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogs that can be accessed via GSM mobile phones. It handles dialogs in three business-oriented domains, with context-sensitive translation between four languages (English, German, Japanese, and Chinese). We show that in Verbmobil's multi-blackboard and multi-engine architecture the results of concurrent processing threads can be combined in an incremental fashion. We argue that all results of concurrent processing modules must come with a confidence value, so that statistically trained selection modules can choose the most promising result. Packed representations together with formalisms for underspecification capture the uncertainties in each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable. Distinguishing features like the multilingual prosody module and the generation of dialog summaries are highlighted. We conclude that Verbmobil has successfully met the project goals with more than 80% of approximately correct translations and a 90% success rate for dialog tasks. One of the main lessons learned from the Verbmobil project is that the problem of speech-tospeech translation can only be cracked by the combined muscle of deep and shallow processing approaches.

[1]  A. Waibel,et al.  Multilingual Speech Recognition , 1997 .

[2]  Walther von Hahn,et al.  Functional Validation of a Machine Interpretation System: Verbmobil , 2000 .

[3]  Hitoshi Iida,et al.  A Japanese-to-English speech translation system: ATR-MATRIX , 1998, ICSLP.

[4]  Hermann Ney,et al.  Statistical Methods for Machine Translation , 2000 .

[5]  Florian Schiel,et al.  Verbmobil Data Collection and Annotation , 2000 .

[6]  Wolfgang Wahlster,et al.  Mobile Speech-to-Speech Translation of Spontaneous Dialogs: An Overview of the Final Verbmobil System , 2000 .

[7]  Martin C. Emele,et al.  Semantic-based Transfer , 1996, COLING.

[8]  Günther Görz,et al.  Processing Self Corrections in a speech to speech system , 2000, COLING.

[9]  Wolfgang Wahlster,et al.  Smartkom: multimodal communication with a life- like character , 2001, INTERSPEECH.

[10]  Jan Alexandersson,et al.  Generating Multilingual Dialog Summaries and Minutes , 2000 .

[11]  Erhard W. Hinrichs,et al.  Robust Chunk Parsing for Spontaneous Speech , 2000 .

[12]  Manfred Pinkal,et al.  Robust Semantic Processing of Spoken Language , 2000 .

[13]  Wolfgang Wahlster,et al.  Verbmobil: the combination of deep and shallow processing for spontaneous speech translation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Mark-Jan Nederhof,et al.  Efficient and Robust Parsing of Word Hypotheses Graphs , 2000 .

[15]  Peter Poller,et al.  The Verbmobil Generation Component VM-GECO , 2000 .

[16]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[17]  Wolfgang Wahlster,et al.  Verbmobil: Translation of Face-To-Face Dialogs , 1993, MTSUMMIT.

[18]  Uwe Reyle,et al.  Dealing with Ambiguities by Underspecification: Construction, Representation and Deduction , 1993, J. Semant..

[19]  Hans Ulrich Block Example-Based Incremental Synchronous Interpretation , 2000 .