Evaluating Long Range Reordering with Permutation-Forests

Automatically evaluating the quality of word order of MT systems is challenging yet crucial for MT evaluation. Existing approaches employ string-based metrics, which are computed over the permutations of word positions in system output relative to a reference translation. We introduce a new metric computed over Permutation Forests (PEFs), tree-based representations that decompose permutations recursively. Relative to string-based metrics, PEFs offer advantages for evaluating long range reordering. We compare the present PEFs metric against five known reordering metrics on WMT13 data for ten language pairs. The PEFs metric shows better correlation with human ranking than the other metrics almost on all language pairs. None of the other metrics exhibits as stable behavior across language pairs.

[1]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Alon Lavie,et al.  The significance of recall in automatic metrics for MT evaluation , 2004, AMTA.

[4]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[5]  Giorgio Satta,et al.  Factoring Synchronous Grammars by Sorting , 2006, ACL.

[6]  Daniel Gildea,et al.  Factorization of Synchronous Context-Free Grammars in Linear Time , 2007, SSST@HLT-NAACL.

[7]  Alexandra Birch,et al.  Metrics for MT evaluation: evaluating reordering , 2010, Machine Translation.

[8]  Alexandra Birch,et al.  LRscore for Evaluating Lexical and Reordering Quality in MT , 2010, WMT@ACL.

[9]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[10]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[11]  Alexandra Birch,et al.  Reordering Metrics for MT , 2011, ACL.

[12]  Hiroshi Ichikawa,et al.  A Lightweight Evaluation Framework for Machine Translation Reordering , 2011, WMT@EMNLP.

[13]  Khalil Sima'an,et al.  Hierarchical Translation Equivalence over Word Alignments , 2011 .

[14]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[15]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[16]  Philipp Koehn,et al.  Results of the WMT15 Metrics Shared Task , 2015, WMT@EMNLP.