Automated Text Simplification as a Preprocessing Step for Machine Translation into an Under-resourced Language

In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT: (1) using the raw output of the ATS; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort.

[1]  Siobhan Devlin,et al.  Helping aphasic people process online information , 2006, Assets '06.

[2]  Advaith Siddharthan,et al.  Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules , 2014, EACL.

[3]  Khaled Shaalan,et al.  Intelligent Natural Language Processing: Trends and Applications , 2018 .

[4]  Shachar Mirkin,et al.  Confidence-driven Rewriting for Improved Translation , 2013, MTSUMMIT.

[5]  Deepti Chopra,et al.  Classifier based text simplification for improved machine translation , 2015, 2015 International Conference on Advances in Computer Engineering and Applications.

[6]  Lucia Specia,et al.  Shared task on quality assessment for text simplification , 2016 .

[7]  Goran Glavas,et al.  Leveraging event-based semantics for automated text simplification , 2017, Expert Syst. Appl..

[8]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[9]  David Kauchak,et al.  Simple English Wikipedia: A New Text Simplification Task , 2011, ACL.

[10]  Mihael Ar Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages , 2015 .

[11]  Richard J. Evans,et al.  Comparing methods for the syntactic simplification of sentences in information extraction , 2011, Literary and Linguistic Computing.

[12]  Paul Buitelaar,et al.  Asistent -- a machine translation system for Slovene, Serbian and Croatian , 2016 .

[13]  Daphne Koller,et al.  Sentence Simplification for Semantic Role Labeling , 2008, ACL.

[14]  Lucia Specia,et al.  Source-Language Entailment Modeling for Translating Unknown Terms , 2009, ACL.

[15]  Sanja tajner,et al.  Leveraging event-based semantics for automated text simplification , 2017 .

[16]  Mihael Arcan,et al.  Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages , 2015, EAMT.

[17]  Sanja Stajner,et al.  Can Text Simplification Help Machine Translation? , 2016, EAMT.

[18]  Caroline Gasperin,et al.  Fostering Digital Inclusion and Accessibility: The PorSimples project for Simplification of Portuguese Texts , 2010, NAACL.

[19]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[20]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[21]  Luz Rello,et al.  DysWebxia: a model to improve accessibility of the textual web for dyslexic users , 2012, ASAC.

[22]  Maja Popovic Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output , 2011, Prague Bull. Math. Linguistics.

[23]  Shachar Mirkin,et al.  SORT: An Interactive Source-Rewriting Tool for Improved Translation , 2013, ACL.

[24]  Sanja Stajner,et al.  Making It Simplext , 2015, ACM Trans. Access. Comput..

[25]  Shachar Mirkin,et al.  Learning an Expert from Human Annotations in Statistical Machine Translation: the Case of Out-of-Vocabulary Words , 2010, EAMT.

[26]  Maja Popovic Comparing Language Related Issues for NMT and PBMT between German and English , 2017, Prague Bull. Math. Linguistics.