Improvement of machine translation evaluation by simple linguistically motivated features

Adopting the regression SVM framework, this paper proposes a linguistically motivated feature engineering strategy to develop an MT evaluation metric with a better correlation with human assessments. In contrast to current practices of "greedy" combination of all available features, six features are suggested according to the human intuition for translation quality. Then the contribution of linguistic features is examined and analyzed via a hill-climbing strategy. Experiments indicate that, compared to either the SVM-ranking model or the previous attempts on exhaustive linguistic features, the regression SVM model with six linguistic information based features generalizes across different datasets better, and augmenting these linguistic features with proper non-linguistic metrics can achieve additional improvements.

[1]  Julio Gonzalo,et al.  MT Evaluation: Human-Like vs. Human Acceptable , 2006, ACL.

[2]  Ming Zhou,et al.  Sentence Level Machine Translation Evaluation as a Ranking , 2007, WMT@ACL.

[3]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[4]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[5]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Lluís Màrquez i Villodre,et al.  A Smorgasbord of Features for Automatic MT Evaluation , 2008, WMT@ACL.

[8]  Alex Kulesza,et al.  A learning approach to improving sentence-level MT evaluation , 2004 .

[9]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[10]  Lixin Wang,et al.  A Quantitative Analysis of Linguistic Factors in Human Translation Evaluation , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[11]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[12]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[13]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[14]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[15]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[16]  Michael Gamon,et al.  A Machine Learning Approach to the Automatic Evaluation of Machine Translation , 2001, ACL.

[17]  Lluís Màrquez i Villodre,et al.  Linguistic Features for Automatic Evaluation of Heterogenous MT Systems , 2007, WMT@ACL.

[18]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[19]  Jimmy J. Lin,et al.  A Paraphrase-Based Approach to Machine Translation Evaluation , 2005 .

[20]  Kevin Duh,et al.  Ranking vs. Regression in Machine Translation Evaluation , 2008, WMT@ACL.

[21]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[22]  Hermann Ney,et al.  CDER: Efficient MT Evaluation Using Block Movements , 2006, EACL.

[23]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[24]  Jackson B. Lackey,et al.  Errata: Handbook of mathematical functions with formulas, graphs, and mathematical tables (Superintendent of Documents, U. S. Government Printing Office, Washington, D. C., 1964) by Milton Abramowitz and Irene A. Stegun , 1977 .

[25]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[26]  Rebecca Hwa,et al.  A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation , 2007, ACL.