Strategy-Based Technology for Estimating MT Quality

This paper introduces our SAU-KERC system that achieved F1 score of 0.39 in the world-level quality estimation task in WMT2015. The goal is to assign each translated word a “OK” or “BAD” label indicating translation quality. We adopt the sequence labeling model, conditional random fields (CRF), to predict the labels. Since “BAD” labels are rare in the training and development sets, recognition rate of "BAD" is low. To solve this problem, we propose two strategies. One is to replace “OK” label with sub-labels to balance label distribution. The other is to reconstruct the training set to include more "BAD" words.