A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU

BLEU is the de facto standard machine translation (MT) evaluation metric. How- ever, because BLEU computes a geo- metric mean of n-gram precisions, it of- ten correlates poorly with human judg- ment on the sentence-level. There- fore, several smoothing techniques have been proposed. This paper systemati- cally compares 7 smoothing techniques for sentence-level BLEU. Three of them are first proposed in this paper, and they correlate better with human judgments on the sentence-level than other smoothing techniques. Moreover, we also compare the performance of using the 7 smoothing techniques in statistical machine transla- tion tuning.