What Level of Quality can Neural Machine Translation Attain on Literary Text?

Given the rise of a new approach to MT, Neural MT (NMT), and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of twelve widely known novels spanning from the the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a 11% relative improvement (3 points absolute) over PBSMT. A complementary human evaluation on three of the books shows that between 17% and 34% of the translations, depending on the book, produced by NMT (versus 8% and 20% with PBSMT) are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.

[1]  Jakob Uszkoreit,et al.  “Poetic” Statistical Machine Translation: Rhyme and Meter , 2010, EMNLP.

[2]  Laurent Besacier Traduction automatisée d'une oeuvre littéraire: une étude pilote , 2014 .

[3]  Ann Irvine,et al.  The (Un)faithful Machine Translator , 2013, LaTeCH@ACL.

[4]  Christian Federmann Appraise: An Open-Source Toolkit for Manual Evaluation of Machine Translation Output , 2012 .

[5]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[6]  Antonio Reyes,et al.  Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection , 2013, Proces. del Leng. Natural.

[7]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[8]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[9]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[10]  Antonio Toral,et al.  caWaC – A web corpus of Catalan and its application to language modeling and machine translation , 2014, LREC.

[11]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[14]  Antonio Toral,et al.  A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions , 2017, EACL.

[15]  Christian Hardmeier,et al.  Discourse in Statistical Machine Translation , 2014 .

[16]  Christian Federmann,et al.  Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output , 2012, Prague Bull. Math. Linguistics.

[17]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[18]  András Kornai,et al.  Parallel corpora for medium density languages , 2007 .

[19]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[20]  Rico Sennrich,et al.  Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation , 2012, EACL.

[21]  Caroline Sporleder,et al.  Using Gaussian Mixture Models to Detect Figurative Language in Context , 2010, NAACL.

[22]  Marcin Junczys-Dowmunt,et al.  Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[23]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[24]  Christian Hardmeier,et al.  On Statistical Machine Translation and Translation Theory , 2015, DiscoMT@EMNLP.

[25]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[26]  Simone Teufel,et al.  Statistical Metaphor Processing , 2013, CL.

[27]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[28]  Daniel Jurafsky,et al.  Towards a Literary Machine Translation: The Role of Referential Cohesion , 2012, CLfL@NAACL-HLT.

[29]  Matt Post,et al.  Efficient Elicitation of Annotations for Human Evaluation of Machine Translation , 2014, WMT@ACL.

[30]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[31]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[32]  Mikel L. Forcada,et al.  Recursive Hetero-associative Memories for Translation , 1997, IWANN.

[33]  Andy Way,et al.  Translating Literary Text between Related Languages using SMT , 2015, CLfL@NAACL-HLT.

[34]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[35]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[36]  Kevin Knight,et al.  Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation , 2010, EMNLP.

[37]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[38]  Andy Way,et al.  Domain adaptation of statistical machine translation with domain-focused web crawling , 2014, Language Resources and Evaluation.

[39]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[40]  Andy Way,et al.  Machine-assisted translation of literary text , 2015 .

[41]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[42]  Antonio Toral,et al.  Abu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences , 2016, WMT.

[43]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[44]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[45]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[46]  Antonio Toral,et al.  Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation , 2017, Prague Bull. Math. Linguistics.

[47]  Philipp Koehn,et al.  Neural Machine Translation , 2017, ArXiv.

[48]  Conghui Zhu,et al.  Domain Adaptation for Statistical Machine Translation , 2018, ArXiv.