论文信息 - Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output

Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output

Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output We describe Hjerson, a tool for automatic classification of errors in machine translation output. The tool features the detection of five word level error classes: morphological errors, reodering errors, missing words, extra words and lexical errors. As input, the tool requires original full form reference translation(s) and hypothesis along with their corresponding base forms. It is also possible to use additional information on the word level (e.g. pos tags) in order to obtain more details. The tool provides the raw count and the normalised score (error rate) for each error class at the document level and at the sentence level, as well as original reference and hypothesis words labelled with the corresponding error class in text and html formats.

Maja Popovic

[1] Aljoscha Burchardt,et al. From Human to Automatic Error Classification for Machine Translation Output , 2011, EAMT.

[2] Hermann Ney,et al. Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[3] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[4] NeyHermann,et al. Towards automatic error analysis of machine translation output , 2011 .

[5] Hermann Ney,et al. Word Error Rates: Decomposition over POS classes and Applications for Error Analysis , 2007, WMT@ACL.

[6] José B. Mariño,et al. Improving a Catalan-Spanish Statistical Translation System using Morphosyntactic Knowledge , 2009, EAMT.

[7] Hermann Ney,et al. Error Analysis of Statistical Machine Translation Output , 2006, LREC.