MT Quality Estimation for Computer-assisted Translation: Does it Really Help?

The usefulness of translation quality estimation (QE) to increase productivity in a computer-assisted translation (CAT) framework is a widely held assumption (Specia, 2011; Huang et al., 2014). So far, however, the validity of this assumption has not been yet demonstrated through sound evaluations in realistic settings. To this aim, we report on an evaluation involving professional translators operating with a CAT tool in controlled but natural conditions. Contrastive experiments are carried out by measuring post-editing time differences when: i) translation suggestions are presented together with binary quality estimates, and ii) the same suggestions are presented without quality indicators. Translators’ productivity in the two conditions is analysed in a principled way, accounting for the main factors (e.g. differences in translators’ behaviour, quality of the suggestions) that directly impact on time measurements. While the general assumption about the usefulness of QE is verified, significance testing results reveal that real productivity gains can be observed only under specific conditions.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Marcello Federico,et al.  Match without a Referee: Evaluating MT Adequacy without Reference Translations , 2012, WMT@NAACL-HLT.

[3]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[4]  Lucia Specia,et al.  An efficient and user-friendly tool for machine translation quality estimation , 2014, LREC.

[5]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[6]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[7]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[8]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[9]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[10]  Marcello Federico,et al.  Coping with the Subjectivity of Human Judgements in MT Quality Estimation , 2013, WMT@ACL.

[11]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[12]  Elisa Ricci,et al.  Online Multitask Learning for Machine Translation Quality Estimation , 2015, ACL.

[13]  Fei Huang,et al.  Adaptive HTER Estimation for Document-Specific MT Post-Editing , 2014, ACL.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Philipp Koehn,et al.  The MateCat Tool , 2014, COLING.

[16]  Marcello Federico,et al.  Data-driven annotation of binary MT quality estimation corpora based on human post-editions , 2014, Machine Translation.

[17]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[18]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[19]  José Guilherme Camargo de Souza,et al.  Adaptive Quality Estimation for Machine Translation , 2014, ACL.