CDER: Efficient MT Evaluation Using Block Movements

Most state-of-the-art evaluation measures for machine translation assign high costs to movements of word blocks. In many cases though such movements still result in correct or almost correct sentences. In this paper, we will present a new evaluation measure which explicitly models block reordering as an edit operation. Our measure can be exactly calculated in quadratic time. Furthermore, we will show how some evaluation measures can be improved by the introduction of word-dependent substitution costs. The correlation of the new measure with human judgment has been investigated systematically on two different language pairs. The experimental results will show that it significantly outperforms state-of-the-art approaches in sentence-level correlation. Results from experiments with word dependent substitution costs will demonstrate an additional increase of correlation between automatic evaluation measures and human judgment.

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  Donald E. Knuth,et al.  The Stanford GraphBase - a platform for combinatorial computing , 1993 .

[3]  Daniel P. Lopresti,et al.  Block Edit Models for Approximate String Matching , 1997, Theor. Comput. Sci..

[4]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[5]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  H. Ney,et al.  A novel string-to-string distance measure with applications to machine translation evaluation , 2003, MTSUMMIT.

[8]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[9]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[10]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[11]  Hermann Ney,et al.  Preprocessing and Normalization for Automatic Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[12]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[13]  Cyril Goutte Automatic Evaluation of Machine Translation Quality , 2006 .