Machine translation evaluation made fuzzier: a study on post-editing productivity and evaluation metrics in commercial settings

In this paper, we report on an experiment carried out in the context of a translation company. Ten translators, with diverse degrees of experience in translation and machine translation postediting (MTPE), were assigned the same task, involving translation from scratch, fuzzy-match post-editing, and MTPE. We evaluate the MT output using traditional evaluation metrics such as BLEU and TER, correlate these measures with productivity values and study whether a fuzzy score stands up against them. Our main goal was to evaluate whether fuzzy scores can be used for evaluating MTPE, thus incorporating its familiarity and TM matching analogies to an MTPE workflow. The results of our experiment seem to support this hypothesis.

[1]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[2]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[3]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[4]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[5]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  M. King,et al.  FEMTI: creating and using a framework for MT evaluation , 2003, MTSUMMIT.

[8]  I. Dan Melamed,et al.  Precision and Recall of Machine Translation , 2003, NAACL.

[9]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[10]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[11]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[12]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[13]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[14]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[15]  Alon Lavie,et al.  Evaluating the Output of Machine Translation Systems , 2010, AMTA.

[16]  Lluís Màrquez i Villodre,et al.  Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.

[17]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[18]  François Masselot,et al.  A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context , 2010, Prague Bull. Math. Linguistics.

[19]  A. Ruopp The “Moses for Localization” Open Source Project , 2010, AMTA.

[20]  Lucia Specia,et al.  PET: a Tool for Post-editing and Assessing Machine Translation , 2012, LREC.

[21]  Ventsislav Zhechev Machine Translation Infrastructure and Post-editing Performance at Autodesk , 2012, AMTA.

[22]  Marcello Federico Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation , 2012, AMTA.

[23]  Hanna Bechara,et al.  Statistical post-editing and quality estimation for machine translation systems , 2013 .

[24]  Christian Saam,et al.  Towards desktop-based CAT tool instrumentation , 2014, AMTA.

[25]  Carla Parra Escartín,et al.  A fuzzier approach to machine translation evaluation: A pilot study on post-editing productivity and automated metrics in commercial settings , 2015, HyTra@ACL.

[26]  Mikel L. Forcada,et al.  A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation , 2015, EAMT.

[27]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.