On the Inference Calibration of Neural Machine Translation

Confidence calibration, which aims to make model predictions equal to the true correctness measures, is important for neural machine translation (NMT) because it is able to offer useful indicators of translation errors in the generated output. While prior studies have shown that NMT models trained with label smoothing are well-calibrated on the ground-truth training data, we find that miscalibration still remains a severe challenge for NMT during inference due to the discrepancy between training and inference. By carefully designing experiments on three language pairs, our work provides in-depth analyses of the correlation between calibration and translation performance as well as linguistic properties of miscalibration and reports a number of interesting findings that might help humans better analyze, understand and improve NMT models. Based on these observations, we further propose a new graduated label smoothing method that can improve both inference calibration and translation performance.

[1]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[2]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[3]  Brendan T. O'Connor,et al.  Posterior calibration and exploratory analysis for natural language processing models , 2015, EMNLP.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  L. Deng,et al.  Calibration of Confidence Measures in Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[8]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Percy Liang,et al.  Calibrated Structured Prediction , 2015, NIPS.

[10]  Lijun Wu,et al.  Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter , 2018, EMNLP.

[11]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[12]  Xiaoqian Jiang,et al.  Predicting accurate probabilities with a ranking loss , 2012, ICML.

[13]  Yang Liu,et al.  Neural Machine Translation with Reconstruction , 2016, AAAI.

[14]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[15]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[18]  Leon Wenliang Zhong,et al.  Accurate Probability Calibration for Multiple Classifiers , 2013, IJCAI.

[19]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[20]  Shu-Tao Xia,et al.  Adaptive Regularization of Labels , 2019, ArXiv.

[21]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[24]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[25]  Jingbo Zhu,et al.  Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[26]  Sunita Sarawagi,et al.  Calibration of Encoder Decoder Models for Neural Machine Translation , 2019, ArXiv.

[27]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[28]  Huanbo Luan,et al.  Improving Back-Translation with Uncertainty-based Confidence Estimation , 2019, EMNLP.

[29]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.