A Study on Automatic Scoring for Machine Translation Systems

String similarity measures of edit distance, cosine correlation and Dice coefficient are adopted to evaluate machine translation results. Experiment shows that the evaluation method distinguishes well between "good" and "bad" translations. Another experiment manifests a consistency between human and automatic scorings of 6 general-purpose MT systems. Equational analysis validates the experimental results. Although the data and graphs are very promising, correlation coefficient and significance tests at 0.01 level are made to ensure the reliability of the results. Linear regression is made to map the automatic scoring results to human scorings.