Discriminative Reranking for Neural Machine Translation

Reranking models enable the integration of rich features to select a better output hypothesis within an n-best list or lattice. These models have a long history in NLP, and we revisit discriminative reranking for modern neural machine translation models by training a large transformer architecture. This takes as input both the source sentence as well as a list of hypotheses to output a ranked list. The reranker is trained to predict the observed distribution of a desired metric, e.g. BLEU, over the n-best list. Since such a discriminator contains hundreds of millions of parameters, we improve its generalization using pre-training and data augmentation techniques. Experiments on four WMT directions show that our discriminative reranking approach is effective and complementary to existing generative reranking approaches, yielding improvements of up to 4 BLEU over the beam search output.

[1]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[2]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[4]  Josep Maria Crego,et al.  Domain Control for Neural Machine Translation , 2016, RANLP.

[5]  Terumasa Ehara SMT reranked NMT , 2017, WAT@IJCNLP.

[6]  Eiichiro Sumita,et al.  Ensemble and Reranking: Using Multiple Models in the NICT-2 Neural Machine Translation System at WAT2017 , 2017, WAT@IJCNLP.

[7]  Lei Yu,et al.  The Neural Noisy Channel , 2016, ICLR.

[8]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[9]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[10]  Enhong Chen,et al.  Bidirectional Generative Adversarial Networks for Neural Machine Translation , 2018, CoNLL.

[11]  Myle Ott,et al.  Residual Energy-Based Models for Text Generation , 2020, ICLR.

[12]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[13]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[14]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[15]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[18]  Wei Chen,et al.  Sogou Neural Machine Translation Systems for WMT17 , 2017, WMT.

[19]  Mohit Iyyer,et al.  Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models , 2020, ACL.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[22]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[23]  Nathan Ng,et al.  Simple and Effective Noisy Channel Modeling for Neural Machine Translation , 2019, EMNLP.

[24]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Tie-Yan Liu,et al.  Adversarial Neural Machine Translation , 2017, ACML.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[29]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[30]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[31]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[32]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[33]  Davis Liang,et al.  Masked Language Model Scoring , 2019, ACL.

[34]  Jiajun Zhang,et al.  A Comparable Study on Model Averaging, Ensembling and Reranking in NMT , 2018, NLPCC.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[37]  Jianfeng Gao,et al.  Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models , 2014, ACL.

[38]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.