Document Context Neural Machine Translation with Memory Networks

We present a document-level neural machine translation model which takes both the source and target document contexts into account using memory networks. We model the problem as a structured prediction problem with interdependencies among the observed and hidden variables, i.e., the source sentences and their unobserved target translations in the document. The resulting structured prediction problem is tackled with a neural translation model equipped with two memory components, one each for the source and target, to capture the documental interdependencies. We train the model end-to-end and propose an iterative decoding algorithm based on the block-coordinate descent. Experimental results and analysis on translating French, German, and Estonian documents to English show that our model is effective in exploiting both source and target document contexts to generate improved translations.

[1]  Orhan Firat,et al.  Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[2]  Gholamreza Haffari,et al.  Incorporating Side Information into Recurrent Neural Network Language Models , 2016, NAACL.

[3]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[4]  Jörg Tiedemann,et al.  Document-Wide Decoding for Phrase-Based Statistical Machine Translation , 2012, EMNLP.

[5]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[6]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7]  Guodong Zhou,et al.  Cache-based Document-level Statistical Machine Translation , 2011, EMNLP.

[8]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Eva Martínez Garcia,et al.  Document-Level Machine Translation with Word Vector Models , 2015, EAMT.

[11]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[12]  Jörg Tiedemann,et al.  Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation , 2013, ACL.

[13]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[14]  Philipp Koehn,et al.  Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.

[15]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[16]  Ingrid Zukerman,et al.  Inter-document Contextual Language model , 2016, HLT-NAACL.

[17]  Eva Martínez Garcia,et al.  Document-Level Machine Translation as a Re-translation Process , 2014, Proces. del Leng. Natural.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[20]  Chris Dyer,et al.  Document Context Language Models , 2015, ICLR 2015.

[21]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[22]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[23]  Marcello Federico,et al.  Modelling pronominal anaphora in statistical machine translation , 2010, IWSLT.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[27]  Hermann Ney,et al.  Empirical Investigation of Optimization Algorithms in Neural Machine Translation , 2017, Prague Bull. Math. Linguistics.

[28]  Eva Martínez Garcia,et al.  Using Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation , 2017, Prague Bull. Math. Linguistics.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[31]  Guodong Zhou,et al.  Document-Level Machine Translation Evaluation with Gist Consistency and Text Cohesion , 2015, DiscoMT@EMNLP.

[32]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[33]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.