Context-Aware Neural Machine Translation Decoding

This work presents a decoding architecture that fuses the information from a neural translation model and the context semantics enclosed in a semantic space language model based on word embeddings. The method extends the beam search decoding process and therefore can be applied to any neural machine translation framework. With this, we sidestep two drawbacks of current document-level systems: (i) we do not modify the training process so there is no increment in training time, and (ii) we do not require document-level an-notated data. We analyze the impact of the fusion system approach and its parameters on the final translation quality for English–Spanish. We obtain consistent and statistically significant improvements in terms of BLEU and METEOR and we observe how the fused systems are able to handle synonyms to propose more adequate translations as well as help the system to disambiguate among several translation candidates for a word.

[1]  Gholamreza Haffari,et al.  Document Context Neural Machine Translation with Memory Networks , 2017, ACL.

[2]  Rico Sennrich,et al.  Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation , 2018, EMNLP.

[3]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[4]  Adrià de Gispert,et al.  CUED@WMT19:EWC&LMs , 2019, WMT.

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[7]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[8]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[9]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[10]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[11]  Yoshua Bengio,et al.  On integrating a language model into neural machine translation , 2017, Comput. Speech Lang..

[12]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[13]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14]  Jörg Tiedemann,et al.  Document-Wide Decoding for Phrase-Based Statistical Machine Translation , 2012, EMNLP.

[15]  Orhan Firat,et al.  Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[16]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[17]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[18]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[19]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[20]  Wei Chen,et al.  Sogou Neural Machine Translation Systems for WMT17 , 2017, WMT.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Cristina España-Bonet,et al.  UdS-DFKI Participation at WMT 2019: Low-Resource (en-gu) and Coreference-Aware (en-de) Systems , 2019, WMT.

[23]  Chris Dyer,et al.  Document Context Language Models , 2015, ICLR 2015.

[24]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[25]  Ondrej Bojar,et al.  English-Czech Systems in WMT19: Document-Level Transformer , 2019, WMT.

[26]  Holger Schwenk,et al.  Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , 2018, ACL.

[27]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[28]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[29]  Gholamreza Haffari,et al.  Selective Attention for Context-aware Neural Machine Translation , 2019, NAACL.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Marcin Junczys-Dowmunt,et al.  Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation , 2019, WMT.

[32]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[33]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[34]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[35]  Kyunghyun Cho,et al.  Larger-Context Language Modelling with Recurrent Neural Network , 2015, ACL.

[36]  Jörg Tiedemann,et al.  The University of Helsinki Submissions to the WMT19 News Translation Task , 2019, WMT.

[37]  Veselin Stoyanov,et al.  Simple Fusion: Return of the Language Model , 2018, WMT.

[38]  Andrei Popescu-Belis,et al.  Context in Neural Machine Translation: A Review of Models and Evaluations , 2019, ArXiv.

[39]  Adam Coates,et al.  Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.

[40]  Huanbo Luan,et al.  Improving the Transformer Translation Model with Document-Level Context , 2018, EMNLP.

[41]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[42]  Yoshua Bengio,et al.  Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[43]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[44]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[45]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.