Sentence Level Machine Translation Evaluation as a Ranking

The paper proposes formulating MT evaluation as a ranking problem, as is often done in the practice of assessment by human. Under the ranking scenario, the study also investigates the relative utility of several features. The results show greater correlation with human assessment at the sentence level, even when using an n-gram match score as a baseline feature. The feature contributing the most to the rank order correlation between automatic ranking and human assessment was the dependency structure relation rather than BLEU score and reference language model feature.

[1]  Alex Kulesza,et al.  A learning approach to improving sentence-level MT evaluation , 2004 .

[2]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[3]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[4]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[5]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[6]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[7]  Liang Zhou,et al.  Re-evaluating Machine Translation Results with Paraphrase Support , 2006, EMNLP.

[8]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[9]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[10]  Eduard Hovy,et al.  Evaluating DUC 2005 using Basic Elements , 2005 .

[11]  Eduard Hovy,et al.  A BE-based Multi-document Summarizer with Query Interpretation , 2005 .

[12]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[13]  Michael Gamon,et al.  Sentence-level MT evaluation without reference translations: beyond language modeling , 2005, EAMT.

[14]  Ding Liu,et al.  Stochastic Iterative Alignment for Machine Translation Evaluation , 2006, ACL.

[15]  Min Zhao,et al.  Ranking definitions with supervised learning methods , 2005, WWW '05.

[16]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[17]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[18]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .