Neural Machine Translation With Noisy Lexical Constraints

In neural machine translation, lexically constrained decoding generates translation outputs strictly including the constraints predefined by users, and it is beneficial to improve translation quality at the cost of more decoding overheads if the constraints are perfect. Unfortunately, those constraints may contain mistakes in real-world situations and incorrect constraints will undermine lexically constrained decoding. In this article, we propose a novel framework that is capable of improving the translation quality even if the constraints are noisy. The key to our framework is to treat the lexical constraints as external memories. More concretely, it encodes the constraints by a memory encoder and then leverages the memories by a memory integrator. Experiments demonstrate that our framework can not only deliver substantial BLEU gains in handling noisy constraints, but also achieve speedup in decoding. These results motivate us to apply our models to a new scenario where the constraints are generated without the help of users. Experiments show that our models can indeed improve the translation quality with the automatically generated constraints.

[1]  Shujian Huang,et al.  PRIMT: A Pick-Revise Framework for Interactive Machine Translation , 2016, NAACL.

[2]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[3]  Yang Liu,et al.  Context Gates for Neural Machine Translation , 2016, TACL.

[4]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[5]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Hermann Ney,et al.  Statistical Approaches to Computer-Assisted Translation , 2009, CL.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Deyi Xiong,et al.  A Context-Aware Recurrent Encoder for Neural Machine Translation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[12]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[13]  Yu Zhou,et al.  Exploring Diverse Features for Statistical Machine Translation Model Pruning , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Jiajun Zhang,et al.  Bridging Neural Machine Translation and Bilingual Dictionaries , 2016, ArXiv.

[16]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[17]  John DeNero,et al.  Models and Inference for Prefix-Constrained Machine Translation , 2016, ACL.

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[20]  Qun Liu,et al.  Memory-enhanced Decoder for Neural Machine Translation , 2016, EMNLP.

[21]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[22]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[23]  Yong Wang,et al.  Search Engine Guided Non-Parametric Neural Machine Translation , 2017, ArXiv.

[24]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Jiajun Zhang,et al.  Attention With Sparsity Regularization for Neural Machine Translation and Summarization , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Yaser Al-Onaizan,et al.  Training Neural Machine Translation to Apply Terminology Constraints , 2019, ACL.

[30]  Min Zhang,et al.  Incorporating Statistical Machine Translation Word Knowledge Into Neural Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Francisco Casacuberta,et al.  Interactive-Predictive Translation Based on Multiple Word-Segments , 2016, EAMT.

[32]  David Grangier,et al.  QuickEdit: Editing Text & Translations by Crossing Words Out , 2017, NAACL.

[33]  Huda Khayrallah,et al.  Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting , 2019, NAACL.

[34]  Philipp Koehn,et al.  Neural Interactive Translation Prediction , 2016, AMTA.

[35]  Min Zhang,et al.  Neural Machine Translation Advised by Statistical Machine Translation , 2016, AAAI.

[36]  Francisco Casacuberta,et al.  Interactive neural machine translation , 2017, Comput. Speech Lang..

[37]  Deyi Xiong,et al.  Encoding Gated Translation Memory into Neural Machine Translation , 2018, EMNLP.

[38]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[39]  Guy Lapalme,et al.  Text prediction for translators , 2002 .

[40]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[41]  Satoshi Nakamura,et al.  Guiding Neural Machine Translation with Retrieved Translation Pieces , 2018, NAACL.

[42]  Lemao Liu,et al.  A Neural Approach to Source Dependence Based Context Model for Statistical Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43]  Lemao Liu,et al.  Word Position Aware Translation Memory for Neural Machine Translation , 2019, NLPCC.

[44]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[45]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[46]  Yang Feng,et al.  Memory-augmented Neural Machine Translation , 2017, EMNLP.

[47]  Lemao Liu,et al.  Graph Based Translation Memory for Neural Machine Translation , 2019, AAAI.

[48]  Yue Zhang,et al.  Code-Switching for Enhancing NMT with Pre-Specified Translation , 2019, NAACL.

[49]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[52]  Ankur Bapna,et al.  Non-Parametric Adaptation for Neural Machine Translation , 2019, NAACL.

[53]  Xin Li,et al.  Transformation Networks for Target-Oriented Sentiment Classification , 2018, ACL.

[54]  Zaixiang Zheng,et al.  Learning to Discriminate Noises for Incorporating External Information in Neural Machine Translation , 2018, ArXiv.