Reading comprehension of machine translation output: what makes for a better read?

This paper reports on a pilot experiment that compares two different machine translation (MT) paradigms in reading comprehension tests. To explore a suitable methodology, we set up a pilot experiment with a group of six users (with English, Spanish and Simplified Chinese languages) using an English Language Testing System (IELTS), and an eye-tracker. The users were asked to read three texts in their native language: either the original English text (for the English speakers) or the machine-translated text (for the Spanish and Simplified Chinese speakers). The original texts were machine-translated via two MT systems: neural (NMT) and statistical (SMT). The users were also asked to rank satisfaction statements on a 3-point scale after reading each text and answering the respective comprehension questions. After all tasks were completed, a post-task retrospective interview took place to gather qualitative data. The findings suggest that the users from the target languages completed more tasks in less time with a higher level of satisfaction when using translations from the NMT system.

[1]  Masaru Tomita,et al.  Evaluation of MT Systems by TOEFL , 1993, TMI.

[2]  Masaru Fuji,et al.  Evaluation experiment for reading comprehension of machine translation outputs , 1999, MTSUMMIT.

[3]  H. Isahara,et al.  Evaluation method for determining groups of users who find MT “useful” , 2001, MTSUMMIT.

[4]  Douglas A. Reynolds,et al.  Measuring human readability of machine generated text: three case studies in speech recognition and machine translation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Johann Roturier,et al.  An investigation into the impact of controlled English rules on the comprehensibility, usefulness and acceptability of machine-translated technical documentation for French and German users , 2006 .

[6]  Sara Stymne,et al.  Eye Tracking as a Tool for Machine Translation Error Analysis , 2012, LREC.

[7]  Sharon O'Brien,et al.  A user-based usability assessment of raw machine translated technical instructions , 2012 .

[8]  Sheila Castilho,et al.  Does post-editing increase usability? A study with Brazilian Portuguese as target language , 2014, EAMT.

[9]  Yoshua Bengio,et al.  Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[10]  Sigrid Klerke,et al.  Reading metrics for estimating task efficiency with MT output , 2015, EMNLP 2015.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[13]  Lucia Specia,et al.  A Reading Comprehension Corpus for Machine Translation Evaluation , 2016, LREC.

[14]  Sharon O'Brien,et al.  Evaluating the Impact of Light Post-Editing on Usability , 2016, LREC.

[15]  Sheila C. M. de Sousa,et al.  Measuring acceptability of machine translated enterprise content , 2016 .

[16]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[17]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[18]  Andy Way,et al.  A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators , 2017, MTSUMMIT.

[19]  Andy Way,et al.  Is Neural Machine Translation the New State of the Art? , 2017, Prague Bull. Math. Linguistics.