Controlling the Reading Level of Machine Translation Output

Today’s machine translation systems output the same translation for a given input, despite important differences between users. In practice, translations should be customized for each reader, for instance when translating for children versus in a business setting. In this paper, we introduce the task of reading level control to machine translation, and provide the first results. Our methods can be used to raise or lower the reading level of output translations. In our first approach, sourceside sentences in the training corpus are tagged based on the reading level (readability) of the matching target sentences. Our second approach alters the traditional encoder-decoder architecture by specifying a joint encoder and separate decoders for simple and complex decoding modes, with training data partitioned by reading level. We demonstrate control over output readability score on three test sets in the Spanish–English language direction.

[1]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[2]  Mark Dredze,et al.  Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language , 2010, HLT-NAACL 2010.

[3]  Sanja Stajner,et al.  Can Text Simplification Help Machine Translation? , 2016, EAMT.

[4]  Marine Carpuat,et al.  A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output , 2017, EMNLP.

[5]  Marcin Junczys-Dowmunt,et al.  Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data , 2018, WMT.

[6]  Lili Mou,et al.  Disentangled Representation Learning for Non-Parallel Text Style Transfer , 2018, ACL.

[7]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[8]  Yulia Tsvetkov,et al.  Style Transfer Through Back-Translation , 2018, ACL.

[9]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[10]  Cam-Tu Nguyen,et al.  Joint learning of frequency and word embeddings for multilingual readability assessment , 2018, NLP-TEA@ACL.

[11]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[12]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[13]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[14]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.

[15]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[16]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[17]  Liviu P. Dinu,et al.  Readability Assessment of Translated Texts , 2015, RANLP.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Jörg Tiedemann,et al.  Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation , 2013, ACL.

[20]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[24]  Yoav Goldberg,et al.  Controlling Linguistic Style Aspects in Neural Language Generation , 2017, ArXiv.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Sergiu Nisioi,et al.  Exploring Neural Text Simplification Models , 2017, ACL.

[27]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[28]  Jörg Tiedemann,et al.  Statistical Machine Translation with Readability Constraints , 2013, NODALIDA.

[29]  Douglas A. Reynolds,et al.  Measuring human readability of machine generated text: three case studies in speech recognition and machine translation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[30]  Thomas François,et al.  Do NLP and machine learning improve traditional readability formulas? , 2012, PITR@NAACL-HLT.

[31]  Keith Carlson,et al.  Zero-Shot Style Transfer in Text Using Recurrent Neural Networks , 2017, ArXiv.

[32]  Josep Maria Crego,et al.  Domain Control for Neural Machine Translation , 2016, RANLP.

[33]  Andy Way,et al.  Getting Gender Right in Neural Machine Translation , 2019, EMNLP.