Learning Translation Templates From Bilingual Text

This paper proposes a two-phase example-based machine translation methodology which develops translation templates from examples and then translates using template matching. This method improves translation quality and facilitates customization of machine translation systems. This paper focuses on the automatic learning of translation templates. A translation template is a bilingual pair of sentences in which corresponding units (words and pharases) are coupled and replaced with variables. Correspondence between units is determined by suing a bilingual dictionary and by analyzing the syntactic structure of the sentences. Syntactic ambiguity and ambiguity in correspondence between units are simultaneously resolved. All of the translation templates generated from a bilingual corpus are grouped by their source language part, and then further refined to resolved conflicts among templates whose source language parts are the same but whose target language parts are different. By using the proposed method, not only transfer rules but also knowledge for lexical selection is effectively extracted from a bilingual corpus.