wEBMT: Developing and Validating an Example-Based Machine Translation System using the World Wide Web

We have developed an example-based machine translation (EBMT) system that uses the World Wide Web for two different purposes: First, we populate the system's memory with translations gathered from rule-based MT systems located on the Web. The source strings input to these systems were extracted automatically from an extremely small subset of the rule types in the Penn-II Treebank. In subsequent stages, the source, target translation pairs obtained are automatically transformed into a series of resources that render the translation process more successful. Despite the fact that the output from on-line MT systems is often faulty, we demonstrate in a number of experiments that when used to seed the memories of an EBMT system, they can in fact prove useful in generating translations of high quality in a robust fashion. In addition, we demonstrate the relative gain of EBMT in comparison to on-line systems. Second, despite the perception that the documents available on the Web are of questionable quality, we demonstrate in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by our system.

[1]  Michael G. Dyer,et al.  The Self-Extending Phrasal Lexicon , 1987, Comput. Linguistics.

[2]  Maria Milosavljevic,et al.  Text Generation in a Dynamic Hypertext Environment , 2003 .

[3]  Manny Rayner,et al.  Hybrid language processing in the Spoken Language Translator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Gregory Grefenstette,et al.  The World Wide Web as a Resource for Example-Based Machine Translation Tasks , 1999, TC.

[5]  Ralf D. Brown,et al.  CLUSTERED TRANSFER RULE INDUCTION FOR EXAMPLE-BASED TRANSLATION , 2003 .

[6]  Eduard Hovy,et al.  Generating language with a phrasal lexicon , 1988 .

[7]  Elliott Macklovitch,et al.  Two Types of Translation Memory , 2000 .

[8]  Graham Russell,et al.  What’s been forgotten in translation memory , 2000, AMTA.

[9]  Patrick Juola,et al.  On Psycholinguistic Grammars , 1998, Grammars.

[10]  Hiroyuki Kaji,et al.  Learning Translation Templates From Bilingual Text , 1992, COLING.

[11]  A Method for Extracting Translation Patterns from Translation Examples , 1993, TMI.

[12]  Thomas R. G. Green,et al.  The necessity of syntax markers: Two experiments with artificial languages , 1979 .

[13]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[14]  Harold L. Somers Further Experiments in Bilingual Text Alignment , 1998 .

[15]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[16]  H. Altay Güvenir,et al.  Learning Translation Rules From A Bilingual Corpus , 1996, ArXiv.

[17]  Philippe Langlais,et al.  Sub-sentential exploitation of translation memories , 2001, MTSUMMIT.

[18]  E. Newport,et al.  Facilitating the acquisition of syntax with cross-sentential cues to phrase structure , 1989 .

[19]  Stelios Piperidis,et al.  Aligning Clattses in Parallel Texts , 1998, EMNLP.

[20]  Reinhard Schäler Machine Translation, Translation Memories and the Phrasal Lexicon: The Localisation Perspective , 1996, EAMT.

[21]  Sergei Nirenburg,et al.  Integrating Translations from Multiple Sources within the PANGLOSS Mark III Machine Translation System , 1994, AMTA.

[22]  Michael Carl Inducing Translation Templates for Example-Based Machine Translation , 1999 .

[23]  Andy Way,et al.  Recent Advances in Example-Based Machine Translation , 2004 .

[24]  Satoshi Sato,et al.  Toward Memory-based Translation , 1990, COLING.

[25]  K. McTait,et al.  A language-neutral sparse-data algorithm for extracting translation patterns , 1999, TMI.

[26]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[27]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[28]  Magnus Merkel,et al.  A System for Incremental and Interactive Word Linking , 2002, LREC.

[29]  Andy Way,et al.  Toward a Hybrid Integrated Translation Environment , 2002, AMTA.

[30]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[31]  Andy Way,et al.  Example-Based Machine Translation via the Web , 2002, AMTA.

[32]  Hans Ulrich Block Example-Based Incremental Synchronous Interpretation , 2000 .

[33]  Martin Kay,et al.  Text-Translation Alignment , 1993, Comput. Linguistics.

[34]  Joseph D. Becker The Phrasal Lexicon , 1975, TINLAP.

[35]  Robert E. Frederking,et al.  An evaluation of the multi-engine MT architecture , 1998, AMTA.

[36]  A. General,et al.  Green, Thomas F. , 1861 .

[37]  Sergei Nirenburg,et al.  Three Heads are Better than One , 1994, ANLP.

[38]  Ralf D. Brown,et al.  Automated Generalization of Translation Examples , 2000, COLING.

[39]  Kazuo Mori,et al.  The Role of Syntax Markers and Semantic Referents in Learning an Artificial Language. , 1983 .