论文信息 - Reducing human assessment of machine translation quality to binary classifiers

Reducing human assessment of machine translation quality to binary classifiers

This paper presents a method to predict human assessments of machine translation (MT) quality based on the combination of binary classifiers using a coding matrix. The multiclass categorization problem is reduced to a set of binary problems that are solved using standard classification learning algorithms trained on the results of multiple automatic evaluation metrics. Experimental results using a large-scale humanannotated evaluation corpus show that the decomposition into binary classifiers achieves higher classification accuracies than the multiclass categorization problem. In addition, the proposed method achieves a higher correlation with human judgments on the sentence-level compared to standard automatic evaluation measures.

E. Sumita | Michael Paul | A. Finch

[1] John S. White,et al. The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.

[2] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[3] Robert Tibshirani,et al. Classification by Pairwise Coupling , 1997, NIPS.

[4] Eiichiro Sumita,et al. Solutions to Problems Inherent in Spoken-language Translation: The ATR-MATRIX Approach , 1999 .

[5] Hermann Ney,et al. An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[6] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[7] Eiichiro Sumita,et al. Using multiple edit distances to automatically rank machine translation output , 2001, MTSUMMIT.

[8] Hermann Ney,et al. Statistical multi-source translation , 2001, MTSUMMIT.

[9] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10] Eiichiro Sumita,et al. Creating corpora for speech-to-speech translation , 2003, INTERSPEECH.

[11] Joseph P. Turian,et al. Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[12] S. Shieber,et al. A learning approach to improving sentence-level MT evaluation , 2004, TMI.

[13] Chris Quirk,et al. Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[14] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[15] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[16] Cyril Goutte. Automatic Evaluation of Machine Translation Quality , 2006 .

[17] D. Gildea,et al. Maximum Correlation Training for Machine Translation Evaluation , 2007 .