Modeling Future Cost for Neural Machine Translation

Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible. However, the trained translation model tends to focus on ensuring the accuracy of the generated target word at the current time-step and does not consider its future cost which means the expected cost of generating the subsequent target translation (i.e., the next target word). To respond to this issue, we propose a simple and effective method to model the future cost of each target word for NMT systems. In detail, a time-dependent future cost is estimated based on the current generated target word and its contextual information to boost the training of the NMT model. Furthermore, the learned future context representation at the current time-step is used to help the generation of the next target word in the decoding. Experimental results on three widely-used translation datasets, including the WMT14 German-to-English, WMT14 English-to-French, and WMT17 Chinese-to-English, show that the proposed approach achieves significant improvements over strong Transformer-based NMT baseline.

[1]  Shuming Shi,et al.  Neural Machine Translation with Adequacy-Oriented Learning , 2018, AAAI.

[2]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[3]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[4]  Nenghai Yu,et al.  Deliberation Networks: Sequence Generation Beyond One-Pass Decoding , 2017, NIPS.

[5]  Zaixiang Zheng,et al.  Neural Machine Translation with Word Predictions , 2017, EMNLP.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Ina Ruck,et al.  USA , 1969, The Lancet.

[8]  Min Zhang,et al.  Incorporating Statistical Machine Translation Word Knowledge Into Neural Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[10]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[11]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[12]  Hai Zhao,et al.  Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[14]  Xiaocheng Feng,et al.  Adaptive Multi-pass Decoder for Neural Machine Translation , 2018, EMNLP.

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[17]  Michael R. Lyu,et al.  Information Aggregation for Multi-Head Attention with Routing-by-Agreement , 2019, NAACL.

[18]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[19]  Tiejun Zhao,et al.  Neural Machine Translation With Sentence-Level Topic Context , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Di He,et al.  Decoding with Value Networks for Neural Machine Translation , 2017, NIPS.

[21]  Tie-Yan Liu,et al.  Neural Machine Translation with Soft Prototype , 2019, NeurIPS.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Jiajun Zhang,et al.  Synchronous Bidirectional Neural Machine Translation , 2019, TACL.

[24]  Zhaopeng Tu,et al.  Dynamic Past and Future for Neural Machine Translation , 2019, EMNLP.

[25]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[26]  Xing Wang,et al.  Modeling Recurrence for Transformer , 2019, NAACL.

[27]  Sarmiza Pencea,et al.  China , 2019, The Statesman’s Yearbook 2019.

[28]  Jingbo Zhu,et al.  A Simple and Effective Approach to Coverage-Aware Neural Machine Translation , 2018, ACL.

[29]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[30]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[31]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[32]  Christopher Joseph Pal,et al.  Twin Networks: Matching the Future for Sequence Generation , 2017, ICLR.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[35]  Daniel Jurafsky,et al.  Learning to Decode for Future Success , 2017, ArXiv.

[36]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[39]  Jiajun Zhang,et al.  Attention With Sparsity Regularization for Neural Machine Translation and Summarization , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[40]  Hai Zhao,et al.  Data-dependent Gaussian Prior Objective for Language Generation , 2020, ICLR.

[41]  Zhaopeng Tu,et al.  Modeling Past and Future for Neural Machine Translation , 2017, TACL.

[42]  Zhiguo Wang,et al.  Coverage Embedding Models for Neural Machine Translation , 2016, EMNLP.

[43]  Lemao Liu,et al.  Target Foresight Based Attention for Neural Machine Translation , 2018, NAACL-HLT.

[44]  Xu Sun,et al.  Deconvolution-Based Global Decoding for Neural Machine Translation , 2018, COLING.

[45]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[46]  Rongrong Ji,et al.  Asynchronous Bidirectional Decoding for Neural Machine Translation , 2018, AAAI.

[47]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[48]  Enhong Chen,et al.  Regularizing Neural Machine Translation by Target-bidirectional Agreement , 2018, AAAI.