论文信息 - Discriminative Reranking for Neural Machine Translation - 字舞流文

Discriminative Reranking for Neural Machine Translation

Reranking models enable the integration of rich features to select a better output hypothesis within an n-best list or lattice. These models have a long history in NLP, and we revisit discriminative reranking for modern neural machine translation models by training a large transformer architecture. This takes as input both the source sentence as well as a list of hypotheses to output a ranked list. The reranker is trained to predict the observed distribution of a desired metric, e.g. BLEU, over the n-best list. Since such a discriminator contains hundreds of millions of parameters, we improve its generalization using pre-training and data augmentation techniques. Experiments on four WMT directions show that our discriminative reranking approach is effective and complementary to existing generative reranking approaches, yielding improvements of up to 4 BLEU over the beam search output.

Marc'Aurelio Ranzato | Michael Auli | Ann Lee | Marc'Aurelio Ranzato | Michael Auli | Ann Lee | M. Ranzato

[1] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[2] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[4] Josep Maria Crego,et al. Domain Control for Neural Machine Translation , 2016, RANLP.

[5] Terumasa Ehara. SMT reranked NMT , 2017, WAT@IJCNLP.

[6] Eiichiro Sumita,et al. Ensemble and Reranking: Using Multiple Models in the NICT-2 Neural Machine Translation System at WAT2017 , 2017, WAT@IJCNLP.

[7] Lei Yu,et al. The Neural Noisy Channel , 2016, ICLR.

[8] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[9] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[10] Enhong Chen,et al. Bidirectional Generative Adversarial Networks for Neural Machine Translation , 2018, CoNLL.

[11] Myle Ott,et al. Residual Energy-Based Models for Text Generation , 2020, ICLR.

[12] Myle Ott,et al. Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[13] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[14] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[15] Michael Collins,et al. Discriminative Reranking for Natural Language Parsing , 2000, CL.

[16] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[18] Wei Chen,et al. Sogou Neural Machine Translation Systems for WMT17 , 2017, WMT.

[19] Mohit Iyyer,et al. Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models , 2020, ACL.

[20] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[22] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[23] Nathan Ng,et al. Simple and Effective Noisy Channel Modeling for Neural Machine Translation , 2019, EMNLP.

[24] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[26] Tie-Yan Liu,et al. Adversarial Neural Machine Translation , 2017, ACML.

[27] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[29] William D. Lewis,et al. Intelligent Selection of Language Model Training Data , 2010, ACL.

[30] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[31] Anoop Sarkar,et al. Discriminative Reranking for Machine Translation , 2004, NAACL.

[32] Alexander M. Fraser,et al. A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[33] Davis Liang,et al. Masked Language Model Scoring , 2019, ACL.

[34] Jiajun Zhang,et al. A Comparable Study on Model Averaging, Ensembling and Reranking in NMT , 2018, NLPCC.

[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36] Marta R. Costa-jussà,et al. Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[37] Jianfeng Gao,et al. Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models , 2014, ACL.

[38] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.