Unobtrusive methods for low-cost manual evaluation of machine translation.

Machine translation (MT) evaluation metrics based on n-gram co-occurrence statistics are financially cheap to execute and their value in comparative research is well documented. However, their value as a standalone measure of MT output quality is questionable. In contrast, manual methods of MT evaluation are financially expensive. This paper will present early research being carried out within the CNGL (Centre for Next Generation Localisation) on a low-cost means of acquiring MT evaluation data in an operationalised manner in a commercial post-edited MT (PEMT) context. An approach to MT evaluation will be presented which exposes translators to output from a set of candidate MT systems and reports back on which system requires the least post-editing. It is hoped that this approach, combined with instrumentation mechanisms for tracking the performance and behaviour of individual post-editors, will give insight into which MT system, if any, out of a set of candidate systems is most suitable for a particular large or ongoing technical translation project. For the longer term we propose that post-editing data gathered in a commercial context may be valuable to MT researchers.

[1]  Sharon O’Brien,et al.  Can MT Output Be Evaluated Through Eye Tracking? , 2009, MTSUMMIT.

[2]  Bruce Phillips,et al.  Tracking real-time user experience (TRUE): a comprehensive instrumentation solution for complex systems , 2008, CHI.

[3]  Takako Aikawa,et al.  Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment , 2007 .

[4]  Ana Guerberof Arenas Productivity and Quality in MT Post-editing , 2009, MTSUMMIT.

[5]  François Masselot,et al.  A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context , 2010, Prague Bull. Math. Linguistics.

[6]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[7]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[8]  M. Tatsumi Correlation between Automatic Evaluation Metric Scores, Post-Editing Speed, and Some Other Factors , 2009, MTSUMMIT.

[9]  Deborah A. Coughlin,et al.  Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12]  Sharon O'Brien,et al.  Methodologies for Measuring the Correlations between Post-Editing Effort and Machine Translatability , 2005, Machine Translation.

[13]  Pushpak Bhattacharyya,et al.  Some Issues in Automatic Evaluation of English-Hindi MT: More Blues for BLEU , 2006 .

[14]  Declan Groves,et al.  Identification and Analysis of Post-Editing Patterns for MT , 2009, MTSUMMIT.

[15]  Jeffrey Allen Case Study: Implementing MT for the Translation of Pre-sales Marketing and Post-sales Software Deployment Documentation at Mycom International , 2004, AMTA.

[16]  Andreas Eisele,et al.  MT Server Land: An Open-Source MT Architecure , 2010, Prague Bull. Math. Linguistics.

[17]  Hans P. Krings,et al.  Repairing Texts: Empirical Investigations of Machine Translation Post-Editing Processes , 2001 .

[18]  Elina Lagoudaki,et al.  Translation Memories Survey 2006 , 2006 .

[19]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[20]  Harold L. Somers,et al.  Computers and translation : a translator's guide , 2003 .