Statistical Analysis of Alignment Characteristics for Phrase-based Machine Translation

In most statistical machine translation (SMT) systems, bilingual segments are extracted via word alignment. However, there lacks systematic study as to what alignment characteristics can benefit MT under specific experimental settings such as the language pair or the corpus size. In this paper we produce a set of alignments by directly tuning the alignment model according to alignment F-score and BLEU score in order to investigate the alignment characteristics that are helpful in translation. We report results for a phrasebased SMT system on Chinese-to-English IWSLT data, and Spanish-to-English European Parliament data. With a statistical analysis into alignment characteristics that are correlated with BLEU score, we give alignment hints to improve BLEU score using a phrase-based SMT system and different types of corpus.

[1]  Hermann Ney,et al.  AER: do we need to “improve” our alignments? , 2006, IWSLT.

[2]  José B. Mariño,et al.  Guidelines for Word Alignment Evaluation and Manual Alignment , 2005, Lang. Resour. Evaluation.

[3]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[4]  Robert C. Moore A Discriminative Framework for Bilingual Word Alignment , 2005, HLT.

[5]  Necip Fazil Ayan,et al.  Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT , 2006, ACL.

[6]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[7]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[8]  Rafael E. Banchs,et al.  Word association models and search strategies for discriminative word alignment , 2008, EAMT.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[11]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[12]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[13]  Rafael E. Banchs,et al.  Discriminative Alignment Training without Annotated Data for Machine Translation , 2007, HLT-NAACL.

[14]  Qin Gao,et al.  Reassessment of the role of phrase extraction in pbsmt , 2009 .

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  Marcello Federico,et al.  Improving Phrase-Based Statistical Translation Through Combination of Word Alignments , 2006, FinTAL.

[17]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[18]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[19]  Yanjun Ma,et al.  Tracking relevant alignment characteristics for machine translation , 2009 .

[20]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.