Semantic Structural Decomposition for Neural Machine Translation

Building on recent advances in semantic parsing and text simplification, we investigate the use of semantic splitting of the source sentence as preprocessing for machine translation. We experiment with a Transformer model and evaluate using large-scale crowd-sourcing experiments. Results show a significant increase in fluency on long sentences on an English-to- French setting with a training corpus of 5M sentence pairs, while retaining comparable adequacy. We also perform a manual analysis which explores the tradeoff between adequacy and fluency in the case where all sentence lengths are considered.

[1]  Advaith Siddharthan,et al.  Text Simplification using Typed Dependencies: A Comparision of the Robustness of Different Generation Strategies , 2011, ENLG.

[2]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[3]  Shashi Narayan,et al.  Split and Rephrase , 2017, EMNLP.

[4]  Robert Dixon,et al.  Basic Linguistic Theory: grammatical topics , 2010 .

[5]  Ari Rappoport,et al.  UCCAApp: Web-application for Syntactic and Semantic Phrase-based Annotation , 2017, ACL.

[6]  Yoav Goldberg,et al.  Split and Rephrase: Better Evaluation and a Stronger Baseline , 2018, ACL.

[7]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[8]  Kevin Duh,et al.  Divide and Translate: Improving Long Distance Reordering in Statistical Machine Translation , 2010, WMT@ACL.

[9]  Robert Dixon Basic Linguistic Theory: methodology , 2010 .

[10]  Yang Liu,et al.  Sub-Sentence Division for Tree-Based Machine Translation , 2009, ACL/IJCNLP.

[11]  Dipti Misra Sharma,et al.  Exploring the effects of Sentence Simplification on Hindi to English Machine Translation System , 2014 .

[12]  Junyi Jessy Li,et al.  Detecting Content-Heavy Sentences: A Cross-Language Case Study , 2015, EMNLP.

[13]  Maja Popović,et al.  Improving Machine Translation of English Relative Clauses with Automatic Text Simplification , 2018 .

[14]  Ari Rappoport,et al.  Conceptual Annotations Preserve Structure Across Translations: A French-English Case Study , 2015 .

[15]  Ronald W. Langacker,et al.  Cognitive Grammar: A Basic Introduction , 2008 .

[16]  Deyi Xiong,et al.  Automatic Long Sentence Segmentation for Neural Machine Translation , 2016, NLPCC/ICCPOL.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Eiichiro Sumita,et al.  Splitting Long Input Sentences f or Phrase-based Statistical Machine Translation , 2011 .

[20]  Ari Rappoport,et al.  Simple and Effective Text Simplification Using Semantic and Neural Methods , 2018, ACL.

[21]  Yoshua Bengio,et al.  Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[22]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[23]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[24]  Ari Rappoport,et al.  Universal Conceptual Cognitive Annotation (UCCA) , 2013, ACL.

[25]  Sanja Stajner,et al.  Can Text Simplification Help Machine Translation? , 2016, EAMT.

[26]  Ari Rappoport,et al.  A Transition-Based Directed Acyclic Graph Parser for UCCA , 2017, ACL.

[27]  Manaal Faruqui,et al.  Learning To Split and Rephrase From Wikipedia Edit History , 2018, EMNLP.

[28]  Ari Rappoport,et al.  BLEU is Not Suitable for the Evaluation of Text Simplification , 2018, EMNLP.

[29]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[30]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.