Two Approaches to Matching in Example-Based Machine Translation

This paper describes two approaches to matching input strings with strings from a translation archive in the example-based machine translation paradigm the more canonical "chunking + matching + recombination" method and an alternative method of matching at the level of complete sentences. The latter produces less exact matches while the former suffers from (often serious) translation quality lapses at the boundaries of recombined chunks. A set of text matching criteria was selected to reflect the trade-off between utility and computational price of each criterion. A metric for comparing text passages was devised and calibrated with the help of a specially constructed diagnostic example set. A partitioning algorithm was developed for finding an optimum "cover" of an input string by a set of best-matching shorter chunks. The results were evaluated in a monolingual setting using an existing MT post-editing tool: the distance between the input and its best match in the archive was calculated in terms of the number of keystrokes necessary to reduce the latter to the former. As a result, the metric was adjusted and an experiment was run to test the two EBMT methods, both on the training corpus and on the working corpus (or "archive") of some 6,500 sentences.

[1]  John Cocke,et al.  A Statistical Approach to Language Translation , 1988, COLING.

[2]  Kenneth Ward Church,et al.  Identifying word correspondence in parallel texts , 1991 .

[3]  Yorick Wilks,et al.  Providing machine tractable dictionary tools , 1990, Machine Translation.

[4]  Victor Sadler,et al.  Working with analogical semantics , 1989 .

[5]  Sergei Nirenburg,et al.  Multi-Purpose Development and Operation Environments for Natural Language Applications , 1992, ANLP.

[6]  Ralph Grishman,et al.  Combining rationalist and empiricist approaches to machine translation , 1992, TMI.

[7]  Satoshi Sato,et al.  Toward Memory-based Translation , 1990, COLING.

[8]  Frank A. Srnad ja,et al.  From N-Grams to Collocations: An Evaluation of Xtract , 1991, ACL.

[9]  H. Somers,et al.  Interactive multilingual text generation for a monolingual user , 1992, TMI.

[10]  A Elithorn,et al.  ARTIFICIAL AND HUMAN INTELLIGENCE , 1984 .

[11]  I. McLean Example-based machine translation using connectionist matching , 1992, TMI.

[12]  Daniel B. Jones Non-hybrid Example-based Machine Translation Architectures , 2005 .

[13]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[14]  Hiroshi Maruyama,et al.  Tree Cover Search Algorithm for Example-Based Translation , 2005 .

[15]  Hideo Watanabe,et al.  A Similarity-Driven Transfer System , 1992, COLING.

[16]  Ramanathan V. Guha,et al.  Building large knowledge-based systems , 1989 .

[17]  Hitoshi Iida,et al.  Experiments and Prospects of Example-Based Machine Translation , 1991, ACL.

[18]  Hiroshi Nomiyama,et al.  Machine Translation by Case Generalization , 1992, COLING.