Involving language professionals in the evaluation of machine translation

Abstract Significant breakthroughs in machine translation (MT) only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The taraXŰ project has paved the way for wide usage of multiple MT outputs through various feedback loops in system development. The project has integrated human translators into the development process thus collecting feedback for possible improvements. This paper describes results from detailed human evaluation. Performance of different types of translation systems has been compared and analysed via ranking, error analysis and post-editing.

[1]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[2]  Guy Lapalme,et al.  Machine Translation of Legal Information and Its Evaluation , 2009, Canadian Conference on AI.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Aljoscha Burchardt,et al.  From Human to Automatic Error Classification for Machine Translation Output , 2011, EAMT.

[5]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[6]  Philipp Koehn,et al.  Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, WMT@ACL 2010, Uppsala, Sweden, July 15-16, 2010 , 2010, WMT@ACL.

[7]  Eleftherios Avramidis,et al.  Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features , 2011, WMT@EMNLP.

[8]  Hermann Ney,et al.  Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[9]  Hermann Ney,et al.  Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Christian Federmann,et al.  Appraise: An Open-Source Toolkit for Manual Phrase-Based Evaluation of Translations , 2010, LREC.

[12]  Alon Lavie,et al.  Proceedings of the Sixth Workshop on Statistical Machine Translation , 2011 .

[13]  Maja Popovic Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output , 2011, Prague Bull. Math. Linguistics.

[14]  Sara Stymne,et al.  Blast: A Tool for Error Analysis of Machine Translation Output , 2011, ACL.

[15]  Yifan He,et al.  Improving the Post-Editing Experience using Translation Recommendation: A User Study , 2010, AMTA.

[16]  Hans Uszkoreit,et al.  Machine Translation at Work , 2013, Computational Linguistics - Applications.

[17]  Andreas Eisele,et al.  MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[18]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .