Analyzing Uncertainty in Neural Machine Translation

Machine translation is a popular test bed for research in neural sequence-to-sequence models but despite much recent research, there is still a lack of understanding of these models. Practitioners report performance degradation with large beams, the under-estimation of rare words and a lack of diversity in the final translations. Our study relates some of these issues to the inherent uncertainty of the task, due to the existence of multiple valid translations for a single source sentence, and to the extrinsic uncertainty caused by noisy training data. We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations. Our results show that search works remarkably well but that models tend to spread too much probability mass over the hypothesis space. Next, we propose tools to assess model calibration and show how to easily fix some shortcomings of current models. As part of this study, we release multiple human reference translations for two popular benchmarks.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[3]  Phil Blunsom,et al.  A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[4]  Shi Yong-hao,et al.  The Translator's Invisibility , 1995 .

[5]  Daniel Jurafsky,et al.  Measuring machine translation quality as semantic equivalence: A metric based on entailment features , 2009, Machine Translation.

[6]  Daniel Marcu,et al.  HyTER: Meaning-Equivalent Semantics for Translation Evaluation , 2012, NAACL.

[7]  Philipp Koehn,et al.  Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.

[8]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[9]  Gregory Shakhnarovich,et al.  A Systematic Exploration of Diversity in Machine Translation , 2013, EMNLP.

[10]  François Yvon,et al.  Lattice BLEU oracles in machine translation , 2013, TSLP.

[11]  Brendan T. O'Connor,et al.  Posterior calibration and exploratory analysis for natural language processing models , 2015, EMNLP.

[12]  Kevin Knight How Much Information Does a Human Translator Add to the Original? , 2015, EMNLP.

[13]  Jianfeng Gao,et al.  deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.

[14]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[15]  Percy Liang,et al.  Calibrated Structured Prediction , 2015, NIPS.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[18]  Daniel Jurafsky,et al.  Mutual Information and Diverse Decoding Improve Neural Machine Translation , 2016, ArXiv.

[19]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[20]  Min Zhang,et al.  Variational Neural Machine Translation , 2016, EMNLP.

[21]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[22]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[23]  Kyunghyun Cho,et al.  Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model , 2016, ArXiv.

[24]  Pierre Isabelle,et al.  A Challenge Set Approach to Evaluating Machine Translation , 2017, EMNLP.

[25]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[28]  Antonio Toral,et al.  A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions , 2017, EACL.

[29]  Jan Niehues,et al.  Analyzing Neural MT Search and Model Performance , 2017, NMT@ACL.

[30]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[31]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.