Exploring Consensus in Machine Translation for Quality Estimation

This paper presents the use of consensus among Machine Translation (MT) systems for the WMT14 Quality Estimation shared task. Consensus is explored here by comparing the MT system output against several alternative machine translations using standard evaluation metrics. Figures extracted from such metrics are used as features to complement baseline prediction models. The hypothesis is that knowing whether the translation of interest is similar or dissimilar to translations from multiple different MT systems can provide useful information regarding the quality of such a translation.

[1]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[2]  Radu Soricut,et al.  The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[3]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[4]  Lucia Specia,et al.  An Investigation on the Effectiveness of Features for Translation Quality Estimation , 2013, MTSUMMIT.

[5]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[6]  Lluís Màrquez i Villodre,et al.  Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.

[7]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[8]  Radu Soricut,et al.  Combining Quality Prediction and System Selection for Improved Automatic Translation Output , 2012, WMT@NAACL-HLT.

[9]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[10]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Rebecca Hwa,et al.  The Role of Pseudo References in MT Evaluation , 2008, WMT@ACL.

[13]  Ani Nenkova,et al.  Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.