论文信息 - Using external sources of bilingual information for on-the-fly word alignment

Using external sources of bilingual information for on-the-fly word alignment

In this paper we present a new and simple language-independent method for word-alignment based on the use of external sources of bilingual information such as machine translation systems. We show that the few parameters of the aligner can be trained on a very small corpus, which leads to results comparable to those obtained by the stateof-the-art tool GIZA++ in terms of precision. Regarding other metrics, such as alignment error rate orF -measure, the parametric aligner, when trained on a very small gold-standard (450 pairs of sentences), provides results comparable to those produced by GIZA++ when trained on an in-domain corpus of around 10,000 pairs of sentences. Furthermore, the results obtained indicate that the training is domain-independent, which enables the use of the trained aligner on the y on any new pair of sentences.

Mikel L. Forcada | Felipe Sánchez-Martínez | Miquel Esplà-Gomis

[1] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[2] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..

[3] David G. Stork,et al. Pattern Classification , 1973 .

[4] Pascale Fung,et al. Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[5] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6] Yang Liu,et al. Discriminative Word Alignment by Linear Modeling , 2010, CL.

[7] Francis M. Tyers,et al. Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[8] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[9] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[10] Anna Samiotou,et al. Automatic Translation Memory Fuzzy Match Post-Editing: A Step Beyond Traditional TM/MT Integration , 2004, LREC.

[11] Yang Liu,et al. Log-Linear Models for Word Alignment , 2005, ACL.