Understanding Pre-Editing for Black-Box Neural Machine Translation

Pre-editing is the process of modifying the source text (ST) so that it can be translated by machine translation (MT) in a better quality. Despite the unpredictability of black-box neural MT (NMT), pre-editing has been deployed in various practical MT use cases. Although many studies have demonstrated the effectiveness of pre-editing methods for particular settings, thus far, a deep understanding of what pre-editing is and how it works for black-box NMT is lacking. To elicit such understanding, we extensively investigated human pre-editing practices. We first implemented a protocol to incrementally record the minimum edits for each ST and collected 6,652 instances of pre-editing across three translation directions, two MT systems, and four text domains. We then analysed the instances from three perspectives: the characteristics of the pre-edited ST, the diversity of pre-editing operations, and the impact of the pre-editing operations on NMT outputs. Our findings include the following: (1) enhancing the explicitness of the meaning of an ST and its syntactic structure is more important for obtaining better translations than making the ST shorter and simpler, and (2) although the impact of pre-editing on NMT is generally unpredictable, there are some tendencies of changes in the NMT outputs depending on the editing operation types.

[1]  Masaru Yamada,et al.  Pre-editing Plus Neural Machine Translation for Subtitling: Effective Pre-editing Rules for Subtitling of TED Talks , 2019, MTSummit.

[2]  Andy Way,et al.  Pre-Reordering for Neural Machine Translation: Helpful or Harmful? , 2017, Prague Bull. Math. Linguistics.

[3]  Johann Roturier How portable are controlled language rules? A comparison of two empirical MT studies , 2007, MTSUMMIT.

[4]  Ursula Reuther,et al.  Two in one – can it work? Readability and translatability by means of controlled language , 2003, EAMT.

[5]  Ming Zhou,et al.  A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation , 2007, ACL.

[6]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[7]  Silvia Hansen-Schirra,et al.  Evaluation of the impact of controlled language on neural machine translation compared to other MT architectures , 2019, Machine Translation.

[8]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[9]  Teruko Mitamura,et al.  Diagnostics for interactive controlled language checking , 2003, EAMT.

[10]  Shachar Mirkin,et al.  SORT: An Interactive Source-Rewriting Tool for Improved Translation , 2013, ACL.

[11]  Yaser Al-Onaizan,et al.  Evaluating Robustness to Input Perturbations for Neural Machine Translation , 2020, ACL.

[12]  Tobias Kuhn,et al.  A Survey and Classification of Controlled Natural Languages , 2014, CL.

[13]  P. Pym,et al.  Pre-editing and the use of simplified writing for MT : an engineer ' s experience of operating an MT system , 2009 .

[14]  Teruko Mitamura,et al.  14. Controlled language for authoring and translation , 2003 .

[15]  Claudia Gdaniec,et al.  MTranslatability , 2001, Machine Translation.

[17]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[18]  Andrew Chesterman,et al.  Memes of Translation: The spread of ideas in translation theory. Revised edition , 1997 .

[19]  Ritwik Kumar,et al.  Simplify-Then-Translate: Automatic Preprocessing for Black-Box Translation , 2020, AAAI.

[20]  Takehiko Yoshimi Improvement of Translation Quality of English Newspaper Headlines by Automatic Pre-editing , 2004, Machine Translation.

[21]  Forbes Ave. Pittsburgh Automatic Rewriting for Controlled Language Translation , 2001 .

[22]  Elisabet Titik Murtisari Explicitation in Translation Studies: The journey of an elusive concept , 2016 .

[23]  Yusuke Miyao,et al.  Discriminative Preordering Meets Kendall's Tau Maximization , 2015, ACL.

[24]  Vinay J.-P. Stylistique comparée du français et de l'anglais , 1959, Canadian Journal of Linguistics/Revue canadienne de linguistique.

[25]  Zhongyuan Zhu,et al.  Evaluating Neural Machine Translation in English-Japanese Task , 2015, WAT.

[26]  Maja Popović,et al.  Improving Machine Translation of English Relative Clauses with Automatic Text Simplification , 2018 .

[27]  Violeta Seretan,et al.  The ACCEPT academic portal : a user-centred online platform for pre-editing and post-editing , 2015 .

[28]  Dejing Dou,et al.  On Adversarial Examples for Character-Level Neural Machine Translation , 2018, COLING.

[29]  J. Darbelnet,et al.  La stylistique comparée du français et de l’anglais , 1983 .

[30]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[31]  Yanli Sun,et al.  A Novel Statistical Pre-Processing Model for Rule-Based Machine Translation System , 2010, EAMT.

[32]  Violeta Seretan,et al.  A Large-Scale Evaluation of Pre-editing Strategies for Improving User-Generated Content Translation , 2014, LREC.

[33]  Takako Aikawa,et al.  Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment , 2007 .

[34]  Yong Cheng,et al.  Robust Neural Machine Translation with Doubly Adversarial Inputs , 2019, ACL.

[35]  Kyo Kageura,et al.  Readability and Translatability Judgments for "Controlled Japanese" , 2012, EAMT.