论文信息 - Empirical Methods for MT Lexicon Development

Empirical Methods for MT Lexicon Development

This article reviews some recently invented methods for automatically extracting translation lexicons from parallel texts. The accuracy of these methods has been significantly improved by exploiting known properties of parallel texts and of particular language pairs. The state of the art has advanced to the point where non-compositional compounds can be automatically identified with high reliability, and their translations can be found. Most importantly, all of these methods can be smoothly integrated into the usual work flow of MT system developers. Semi-automatic MT lexicon construction is likely to be more efficient and more accurate than either fully automatic or fully manual methods alone.

I. Dan Melamed

[1] David Yarowsky,et al. One Sense per Collocation , 1993, HLT.

[2] Pascale Fung,et al. Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[3] I. Dan Melamed,et al. Bitext Maps and Alignment via Pattern Recognition , 1999, CL.

[4] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[5] Kenneth Ward Church,et al. Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[6] Vasileios Hatzivassiloglou,et al. Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[7] Hideki Hirakawa,et al. Building An MT Dictionary From Parallel Texts Based On Linguistic And Statistical Information , 1994, COLING.

[8] I. Dan Melamed,et al. Word-to-Word Models of Translational Equivalence , 1998, ArXiv.

[9] Djoerd Hiemstra,et al. Using statistical methods to create a bilingual dictionary , 1996 .

[10] I. Dan Melamed. A Word-to-Word Model of Translational Equivalence , 1997, ACL.