Predicting Post-Editing Effort for English-Chinese Neural Machine Translation

Although the technology of machine translation (MT) has achieved great progress in recent years, the output generated by MT systems still need human post-editing (PE) to meet the quality requirements of many translation tasks. How to measure the PE effort of an MT output is a key problem in PE study. PE time is a metric that can well reflect the overall PE effort and is widely used in practice. This paper studied the method to automatically predict the PE time. We manually post-edited the English-Chinese NMT output in three domains, with PE time recorded and MT error types annotated. Based on the corpus, we took error types as features and used machine learning models to predict PE time. Since the error types in MT output are unavailable in real PE scenario, we also proposed a prediction method based on the translation quality estimation (QE) framework that can make prediction without error information. Experimental results show that both methods achieve higher correlation with the actual PE time than the baseline method.