Linguistic Knowledge and Complexity in an EBMT System Based on Translation Patterns

An approach to Example-Based Machine Translation is presented which operates by extracting translation patterns from a bilingual corpus aligned at the level of the sentence. This is carried out using a language-neutral recursive machine-learning algorithm based on the principle of similar distributions of strings. The translation patterns extracted represent generalisations of sentences that are translations of each other and, to some extent, resemble transfer rules but with fewer constraints. The strings and variables, of which translations patterns are composed, are aligned in order to provide a more refined bilingual knowledge source, necessary for the recombination phase. A non-structural approach based on surface forms is error prone and liable to produce translation patterns that are false translations. Such errors are highlighted and solutions are proposed by the addition of external linguistic resources, namely morphological analysis and part-of-speech tagging. The amount of linguistic resources added has consequences for computational complexity and portability.

[1]  Harold L. Somers Further Experiments in Bilingual Text Alignment , 1998 .

[2]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[3]  Kimmo Koskenniemi,et al.  Two-Level Model for Morphological Analysis , 1983, IJCAI.

[4]  Yuji Matsumoto,et al.  Acquisition of translation rules from parallel corpora , 1997 .

[5]  K. McTait,et al.  A language-neutral sparse-data algorithm for extracting translation patterns , 1999, TMI.

[6]  Harold L. Somers,et al.  Evaluation metrics for a translation memory system , 1999 .

[7]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[8]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[9]  Michael Carl Inducing Translation Templates for Example-Based Machine Translation , 1999 .

[10]  Alexander H. Waibel,et al.  Modeling with Structures in Statistical Machine translation , 1998, ACL.

[11]  Osamu Furuse,et al.  FORMALIZING TRANSLATION MEMORY , 2003 .

[12]  MBT2: A Method for Combining Fragments of Examples in Example-Based Translation , 1995, Artif. Intell..

[13]  Hiroyuki Kaji,et al.  Learning Translation Templates From Bilingual Text , 1992, COLING.

[14]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[15]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[16]  Kenneth Ward Church,et al.  Robust Bilingual Word Alignment for Machine Aided Translation , 1993, VLC@ACL.

[17]  Koichi Takeda,et al.  Pattern-Based Context-Free Grammars for Machine Translation , 1996, ACL.

[18]  H. Altay Güvenir,et al.  Learning Translation Templates from Examples , 1998, Inf. Syst..

[19]  Christos Malavazos,et al.  Application of Analogical Modelling to Example Based Machine Translation , 2000, COLING.

[20]  Ruslan Mitkov,et al.  Recent Advances in Natural Language Processing: Selected Papers from RANLP ’95 , 1997 .

[21]  Osamu Furuse,et al.  Formalizing translation memories , 1999, MTSUMMIT.

[22]  Hideo Watanabe,et al.  A Similarity-Driven Transfer System , 1992, COLING.

[23]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[24]  Kevin McTait,et al.  A Building Blocks Approach to Translation Memory , 1999, TC.

[25]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.