Using external sources of bilingual information for on-the-fly word alignment

In this paper we present a new and simple language-independent method for word-alignment based on the use of external sources of bilingual information such as machine translation systems. We show that the few parameters of the aligner can be trained on a very small corpus, which leads to results comparable to those obtained by the stateof-the-art tool GIZA++ in terms of precision. Regarding other metrics, such as alignment error rate orF -measure, the parametric aligner, when trained on a very small gold-standard (450 pairs of sentences), provides results comparable to those produced by GIZA++ when trained on an in-domain corpus of around 10,000 pairs of sentences. Furthermore, the results obtained indicate that the training is domain-independent, which enables the use of the trained aligner on the y on any new pair of sentences.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[5]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6]  Yang Liu,et al.  Discriminative Word Alignment by Linear Modeling , 2010, CL.

[7]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[8]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[9]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[10]  Anna Samiotou,et al.  Automatic Translation Memory Fuzzy Match Post-Editing: A Step Beyond Traditional TM/MT Integration , 2004, LREC.

[11]  Yang Liu,et al.  Log-Linear Models for Word Alignment , 2005, ACL.

[12]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[13]  Mikel L. Forcada,et al.  Using machine translation in computer-aided translation to suggest the target-side words to change , 2011, MTSUMMIT.

[14]  Jörg Tiedemann,et al.  Bitext Alignment , 2011, Synthesis Lectures on Human Language Technologies.

[15]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[16]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[17]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[18]  José B. Mariño,et al.  Guidelines for Word Alignment Evaluation and Manual Alignment , 2005, Lang. Resour. Evaluation.

[19]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[20]  Antonio Toral,et al.  Efficiency-based evaluation of aligners for industrial applications , 2012, EAMT.

[21]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[22]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[23]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.