Simple and Effective Noisy Channel Modeling for Neural Machine Translation

Previous work on neural noisy channel modeling relied on latent variable models that incrementally process the source and target sentence. This makes decoding decisions based on partial source prefixes even though the full source is available. We pursue an alternative approach based on standard sequence to sequence models which utilize the entire source. These models perform remarkably well as channel models, even though they have neither been trained on, nor designed to factor over incomplete target sentences. Experiments with neural language models trained on billions of words show that noisy channel models can outperform a direct model by up to 3.2 BLEU on WMT'17 German-English translation. We evaluate on four language-pairs and our channel models consistently outperform strong alternatives such right-to-left reranking models and ensembles of direct models.

[1]  Lei Yu,et al.  The Neural Noisy Channel , 2016, ICLR.

[2]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Kevin Duh,et al.  The JHU Machine Translation Systems for WMT 2018 , 2018, WMT.

[5]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[6]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[7]  Marcin Junczys-Dowmunt,et al.  Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data , 2018, WMT.

[8]  Kai Song,et al.  Alibaba’s Neural Machine Translation Systems for WMT18 , 2018, WMT.

[9]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[10]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[11]  David Grangier,et al.  Vocabulary Selection Strategies for Neural Machine Translation , 2016, ArXiv.

[12]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[13]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14]  Marine Carpuat,et al.  The University of Maryland’s Chinese-English Neural Machine Translation Systems at WMT18 , 2018, WMT.

[15]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[16]  Alexei Baevski,et al.  Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.

[17]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[18]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[19]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[20]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[21]  Wei Chen,et al.  Sogou Neural Machine Translation Systems for WMT17 , 2017, WMT.

[22]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[23]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[24]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.