Tracking relevant alignment characteristics for machine translation

In most statistical machine translation (SMT) systems, bilingual segments are extracted via word alignment. In this paper we compare alignments tuned directly according to alignment F-score and BLEU score in order to investigate the alignment characteristics that are helpful in translation. We report results for two different SMT systems (a phrase-based and an n-gram-based system) on Chinese to English IWSLT data, and Spanish to English European Parliament data. We give alignment hints to improve BLEU score, depending on the SMT system used and the type of corpus.

[1]  François Yvon,et al.  Minimum Error Rate Training Semiring , 2011, EAMT.

[2]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[3]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[7]  Robert C. Moore A Discriminative Framework for Bilingual Word Alignment , 2005, HLT.

[8]  Necip Fazil Ayan,et al.  Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT , 2006, ACL.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[11]  R. R. Newell The Mathematics of Statistics , 1952 .

[12]  Hermann Ney,et al.  AER: do we need to “improve” our alignments? , 2006, IWSLT.

[13]  José B. Mariño,et al.  Guidelines for Word Alignment Evaluation and Manual Alignment , 2005, Lang. Resour. Evaluation.

[14]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[15]  Rafael E. Banchs,et al.  Discriminative Alignment Training without Annotated Data for Machine Translation , 2007, HLT-NAACL.

[16]  Marcello Federico,et al.  Improving Phrase-Based Statistical Translation Through Combination of Word Alignments , 2006, FinTAL.

[17]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[18]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[19]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.