Translating Literary Text between Related Languages using SMT

We explore the feasibility of applying machine translation (MT) to the translation of literary texts. To that end, we measure the translatability of literary texts by analysing parallel corpora and measuring the degree of freedom of the translations and the narrowness of the domain. We then explore the use of domain adaptation to translate a novel between two related languages, Spanish and Catalan. This is the first time that specific MT systems are built to translate novels. Our best system outperforms a strong baseline by 4.61 absolute points (9.38% relative) in terms of BLEU and is corroborated by other automatic evaluation metrics. We provide evidence that MT can be useful to assist with the translation of novels between closely-related languages, namely (i) the translations produced by our best system are equal to the ones produced by a professional human translator in almost 20% of cases with an additional 10% requiring at most 5 character edits, and (ii) a complementary human evaluation shows that over 60% of the translations are perceived to be of the same (or even higher) quality by native speakers.

[1]  Rico Sennrich,et al.  Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation , 2012, EACL.

[2]  Juan Alberto Alonso Martín,et al.  Integration of a Machine Translation System into the Editorial Process Flow of a Daily Newspaper , 2014, Proces. del Leng. Natural.

[3]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[4]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[5]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[6]  Timothy Baldwin,et al.  Is Machine Translation Getting Better over Time? , 2014, EACL.

[7]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[8]  András Kornai,et al.  Parallel corpora for medium density languages , 2007 .

[9]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[10]  Kevin Knight,et al.  Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation , 2010, EMNLP.

[11]  Ventsislav Zhechev Machine Translation Infrastructure and Post-editing Performance at Autodesk , 2012, AMTA.

[12]  Andreas Stolcke,et al.  SRILM at Sixteen: Update and Outlook , 2011 .

[13]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[14]  John Cocke,et al.  A Statistical Approach to Language Translation , 1988, COLING.

[15]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[16]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Long Jiang,et al.  Generating Chinese Couplets using a Statistical MT Approach , 2008, COLING.

[19]  Rada Mihalcea,et al.  Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature , 2012, HLT-NAACL 2012.

[20]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[21]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[22]  Antonio Toral,et al.  caWaC – A web corpus of Catalan and its application to language modeling and machine translation , 2014, LREC.

[23]  D. Cox,et al.  Statistical significance tests. , 1982, British journal of clinical pharmacology.

[24]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[25]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[26]  Christian Federmann Appraise: An Open-Source Toolkit for Manual Evaluation of Machine Translation Output , 2012 .

[27]  Daniel Jurafsky,et al.  Towards a Literary Machine Translation: The Role of Referential Cohesion , 2012, CLfL@NAACL-HLT.

[28]  Dekai Wu,et al.  Modeling Hip Hop Challenge-Response Lyrics as Machine Translation , 2013, MTSUMMIT.

[29]  Christian Federmann,et al.  Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output , 2012, Prague Bull. Math. Linguistics.

[30]  Martin Volk,et al.  The Automatic Translation of Film Subtitles. A Machine Translation Success Story? , 2008, J. Lang. Technol. Comput. Linguistics.

[31]  Josef van Genabith,et al.  Domain Adaptation of Statistical Machine Translation using Web-Crawled Resources: A Case Study , 2012, EAMT.

[32]  François Yvon,et al.  Minimum Error Rate Training Semiring , 2011, EAMT.

[33]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[34]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[35]  Marcello Federico,et al.  Complexity of spoken versus written language for machine translation , 2014, EAMT.

[36]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[37]  Jakob Uszkoreit,et al.  “Poetic” Statistical Machine Translation: Rhyme and Meter , 2010, EMNLP.

[38]  Laurent Besacier Traduction automatisée d'une oeuvre littéraire: une étude pilote , 2014 .

[39]  Long Jiang,et al.  Generating Chinese Classical Poems with Statistical Machine Translation Models , 2012, AAAI.