Support Vector Methods for Sentence Level Machine Translation Evaluation

Recent work in the field of machine translation (MT) evaluation suggests that sentence level evaluation based on machine learning (ML) can outperform the standard metrics such as BLEU, ROUGE and METEOR. We conducted a comprehensive empirical study on support vector methods for ML-based MT evaluation involving multi-class support vector machines (SVM) and support vector regression (SVR) with different kernel functions. We empathize on a systematic comparison study of multiple feature models obtained with feature selection and feature extraction techniques. Besides finding the conditions yielding the best empirical results, our study supports several unobvious conclusions regarding qualitative and quantitative aspects of feature sets in MT evaluation.

[1]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[2]  Rebecca Hwa,et al.  A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation , 2007, ACL.

[3]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[4]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[5]  Chin-Yew Lin,et al.  Looking for a Few Good Metrics: ROUGE and its Evaluation , 2004 .

[6]  Michael Gamon,et al.  A Machine Learning Approach to the Automatic Evaluation of Machine Translation , 2001, ACL.

[7]  Alex Kulesza,et al.  A learning approach to improving sentence-level MT evaluation , 2004 .

[8]  Ming Zhou,et al.  Sentence Level Machine Translation Evaluation as a Ranking , 2007, WMT@ACL.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Hwee Tou Ng,et al.  MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation , 2008, ACL.

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[13]  Daniel Jurafsky,et al.  Robust Machine Translation Evaluation with Entailment Features , 2009, ACL.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.