Automatic Translation Memory Fuzzy Match Post-Editing: A Step Beyond Traditional TM/MT Integration

An innovative way of integrating Translation Memory (TM) and Machine Translation (MT) processing is presented which goes beyond the traditional cascade integration of Translation Memory and Machine Translation. The new method aims to automatically post-edit TM similar matches by the use of an MT module thus enhancing the TM fuzzy (similar) scores as well as enabling the utilisation of low-score TM fuzzy matches. This leads to substantial translation cost reduction. The suggested method, which can be classified as an Example-Based Machine Translation application, is analysed and examples are provided for clarification. It is evaluated through test results that involve human interaction. The method has been implemented within the ESTeam Translator (ET) Language Toolbox and is already in use in the various commercial installations of ET. 1. Automatic Translation Memory Fuzzy Match Post-Editing According to the standard TM paradigm (Nagao, 1984), an input text unit (usually a sentence) to be translated is matched against the source language part of translation pairs stored in the TM. If an identical (full) or similar (fuzzy) match is located, then the system suggests its target language equivalent as the translation of the original text unit and lets the user accept/edit this suggestion in order to correspond accurately to the translation of the input text unit. When no full/fuzzy match can be located, the option is usually offered to invoke MT processing to translate the input text unit. The method proposed in this paper, can be classified as an Example-Based Machine Translation application (Somers, 1999), taking the TM-MT integration one step further manipulating the fuzzy match result by invoking MT (in context) in order to automatically correct the TM-based translation suggestion. We denote as Sinp-SL the input text unit, for example a sentence, consisting of words to be translated from the Source Language (SL) into the Target Language (TL). Suppose that the TM contains a text-unit pair, for example sentences again, denoted as Sref-SL and Sref-TL. The standard definition of a fuzzy match translation is that if Sinp-SL is similar to Sref-SL, through the similarity of (some of) their words, then Sref-TL is proposed as the translation of Sinp-SL (to be verified/edited by a human translator). The suggested method exploits fuzzy match information M(Sinp-SL, Sref-SL) as well as word-alignment information A(Sref-SL, Sref-TL) referring to the TM text-unit pair, in order to apply modifications on Sref-TL to correspond to the translation of Sinp-SL. The fuzzy match information M(Sinp-SL, Sref-SL) defines the links between words of Sinp-SL and Sref-SL, in other words it defines which inputSL word has matched to which reference-SL word. This type of information is standard in all TM systems since it is used in order to estimate the similarity score of a match. The word-alignment information A(Sref-SL, SrefTL), however, is anything but standard. The bottleneck of the application of Fuzzy Match Post Editing is the existence of word-alignment information (for the TM contents), which enables the appropriate correction of the TL reference text units. Word-alignment information defines the translation links between words of reference-SL and reference-TL text units (the TM pair), in other words it defines which word/phrase of the Sref-SL translates to which word/phrase of the SrefTL (and can, in general, include phrases with nonconsecutive words). This information, which is not necessarily exhaustive, can be either calculated on-line (by looking up an MT dictionary) or can be pre-stored in the TM. In the ESTeam Translator system, wordalignment information is available, through a process of automatically aligning text units at various text levels (paragraphs, sentences, subsentences) (Meyers 1998, Ahrenberg et al, 2000) by the use of (among other resources) an MT Dictionary of words and phrases. The MT Dictionary defines the relevance of two text units being compared (by defining translation links between their words) and then marks the corresponding wordalignment information to be later used for the application of Fuzzy Match Post Editing . The basic idea of the Fuzzy Match Post Editing is quite simple and it is graphically depicted in Figure 1 for the case of an example involving all supported actions: Insertion(s) of Word(s) It identifies mismatched words in Sinp-SL and based on the fuzzy match information M(Sinp-SL, Sref-SL), which provides anchor points in the vicinity of these mismatched words, it tries to identify the corresponding missing word positions in Sref-SL. It then searches in A(Sref-SL, Sref-TL) for potential available word-alignment