User-focused task-oriented MT evaluation for wikis : a case study

This paper reports on an evaluation experiment focusing on statistical machine translation (MT) software integrated into a more complex system for the synchronization of multilingual information contained in wiki sites. The experiment focused on the translation of wiki entries from German and Dutch into English carried out by ten media professionals, editors, journalists and translators working at two major media organizations who post-edited the MT output. The investigation concerned in particular the adequacy of MT to support the translation of wiki pages, and the results include both its success rate (i.e. MT effectiveness) and the associated confidence of the users (i.e. their satisfaction). Special emphasis is laid on the post-editing effort required to bring the output to publishable standard. The results show that overall the users were satisfied with the system and regarded it as a potentially useful tool to support their work; in particular, they found that the post-editing effort required to attain translated wiki entries in English of publishable quality was lower than translating from scratch.

[1]  Andy Way,et al.  A Comparative Evaluation of Research vs. Online MT Systems , 2011, EAMT.

[2]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[3]  Ana Guerberof Arenas Productivity and Quality in MT Post-editing , 2009, MTSUMMIT.

[4]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[5]  Philipp Koehn,et al.  Shared Task: Statistical Machine Translation between European Languages , 2005, ParallelText@ACL.

[6]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[7]  Sharon O'Brien,et al.  Methodologies for Measuring the Correlations between Post-Editing Effort and Machine Translatability , 2005, Machine Translation.

[8]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[9]  Harold L. Somers,et al.  Computers and translation : a translator's guide , 2003 .

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Hans P. Krings,et al.  Repairing Texts: Empirical Investigations of Machine Translation Post-Editing Processes , 2001 .

[12]  John S. White,et al.  Adaptation of the DARPA machine translation evlauation paradigm to end-to-end systems , 1996, AMTA.

[13]  John S. White,et al.  How to evaluate machine translation , 2012 .

[14]  Andy Way,et al.  A Framework for Diagnostic Evaluation of MT Based on Linguistic Checkpoints , 2011, MTSUMMIT.

[15]  Michael Carl,et al.  The Process of Post-Editing: A Pilot Study , 2011 .

[16]  Lucia Specia,et al.  Exploiting Objective Annotations for Measuring Translation Post-editing Effort , 2011 .

[17]  Christof Monz,et al.  The UvA system description for IWSLT 2010 , 2010, IWSLT.

[18]  Lucia Specia,et al.  Estimating Machine Translation Post-Editing Effort with HTER , 2010, JEC.

[19]  François Masselot,et al.  A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context , 2010, Prague Bull. Math. Linguistics.

[20]  Barry Haddow,et al.  Interactive Assistance to Human Translators using Statistical Machine Translation Methods , 2009, MTSUMMIT.

[21]  Anabela Barreiro,et al.  Workshop ­ Beyond Translation Memories: New Tools for Translators , 2009 .

[22]  Sharon O'Brien,et al.  An empirical investigation of temporal and technical post-editing effort , 2007 .

[23]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[24]  Ubaldo Stecconi,et al.  SOMERS, Harold (ed.). Computers and Translation : a Translator’s Guide , 2004 .

[25]  M. King,et al.  FEMTI: creating and using a framework for MT evaluation , 2003, MTSUMMIT.