Comparative Quality Estimation for Machine Translation Observations on Machine Learning and Features

Abstract A deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics. Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.

[1]  Kevin Duh,et al.  Ranking vs. Regression in Machine Translation Evaluation , 2008, WMT@ACL.

[2]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[3]  Eleftherios Avramidis,et al.  Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features , 2011, WMT@EMNLP.

[4]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[5]  Maja Popovic rgbF: An Open Source Tool for n-gram Based Automatic Evaluation of Machine Translation Output , 2012, Prague Bull. Math. Linguistics.

[6]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[7]  Eleftherios Avramidis,et al.  Selecting Feature Sets for Comparative and Time-Oriented Quality Estimation of Machine Translation Output , 2013, WMT@ACL.

[8]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[9]  Eleftherios Avramidis,et al.  Machine learning methods for comparative and time-oriented Quality Estimation of Machine Translation output , 2013 .

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[12]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[13]  Ventsislav Zhechev Unsupervised Generation of Parallel Treebanks through Sub-Tree Alignment , 2009, Prague Bull. Math. Linguistics.

[14]  Stephen Wan,et al.  GLEU: Automatic Evaluation of Sentence-Level Fluency , 2007, ACL.

[15]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[16]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[17]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[18]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[19]  Toshiyuki Takezawa,et al.  Automatic machine translation selection scheme to output the best result , 2002, LREC.

[20]  Eleftherios Avramidis,et al.  Qualitative: Python Tool for MT Quality Estimation Supporting Server Mode and Hybrid MT , 2016, Prague Bull. Math. Linguistics.

[21]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[22]  Lluís Formiga Fanals,et al.  Real-life translation quality estimation for MT system selection , 2013 .