Multitask Models for Controlling the Complexity of Neural Machine Translation

We introduce a machine translation task where the output is aimed at audiences of different levels of target language proficiency. We collect a novel dataset of news articles available in English and Spanish and written for diverse reading grade levels. We leverage this dataset to train multitask sequence to sequence models that translate Spanish into English targeted at an easier reading grade level than the original Spanish. We show that multitask models outperform pipeline approaches that translate and simplify text independently.

[1]  Philipp Koehn,et al.  Controlling the Reading Level of Machine Translation Output , 2019, MTSummit.

[2]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[3]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.

[4]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[5]  Sanja Stajner,et al.  Can Text Simplification Help Machine Translation? , 2016, EAMT.

[6]  Paolo Rosso,et al.  CATS: A Tool for Customized Alignment of Text Simplification Corpora , 2018, LREC.

[7]  Kevyn Collins-Thompson,et al.  An Analysis of Statistical Models and Features for Reading Difficulty Prediction , 2008, ACL 2008.

[8]  Marine Carpuat,et al.  Controlling Text Complexity in Neural Machine Translation , 2019, EMNLP.

[9]  Tomoyuki Kajiwara,et al.  Controllable Text Simplification with Lexical Constraint Loss , 2019, ACL.

[10]  Mari Ostendorf,et al.  Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[11]  Lucia Specia,et al.  Learning Simplifications for Specific Target Audiences , 2018, ACL.

[12]  David K. Allen,et al.  A study of the role of relative clauses in the simplification of news texts for learners of English , 2009 .

[13]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[14]  Adrià de Gispert,et al.  Source sentence simplification for statistical machine translation , 2017, Comput. Speech Lang..

[15]  Kentaro Inui,et al.  Text Simplification for Reading Assistance: A Project Note , 2003, IWP@ACL.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Junyi Jessy Li,et al.  Discourse Level Factors for Sentence Deletion in Text Simplification , 2019, AAAI.

[18]  John Tait,et al.  Cohesive Generation of Syntactically Simplified Newspaper Text , 2000, TSD.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Eduard H. Hovy,et al.  Improving Translation Quality by Manipulating Sentence Length , 1998, AMTA.