Automatic Translation Error Analysis

We propose a method of automatic identification of various error types in machine translation output. The approach is mostly based on monolingual word alignment of the hypothesis and the reference translation. In addition to common lexical errors misplaced words are also detected. A comparison to manually classified MT errors is presented. Our error classification is inspired by that of Vilar (2006; [17]), although distinguishing some of their categories is beyond the reach of the current version of our system.

[1]  Jianfeng Gao,et al.  Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems , 2008, EMNLP.

[2]  Philipp Koehn,et al.  Proceedings of the Fourth Workshop on Statistical Machine Translation, WMT@EACL 2009, Athens, Greece, March 30-31, 2009 , 2009, WMT@EACL.

[3]  Ondrej Bojar,et al.  Analyzing Error Types in English-Czech Machine Translation , 2011, Prague Bull. Math. Linguistics.

[4]  Philipp Koehn,et al.  Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, WMT@ACL 2010, Uppsala, Sweden, July 15-16, 2010 , 2010, WMT@ACL.

[5]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[6]  José B. Mariño,et al.  Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair , 2011, Lang. Resour. Evaluation.

[7]  Khalil Sima'an,et al.  Proceedings of the Sixth International Language Resources and Evaluation (LREC'08) , 2008 .

[8]  Lluís Màrquez i Villodre,et al.  Towards Heterogeneous Automatic MT Error Analysis , 2008, LREC.

[9]  Hermann Ney,et al.  Word Error Rates: Decomposition over POS classes and Applications for Error Analysis , 2007, WMT@ACL.

[10]  Alon Lavie,et al.  Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level , 2010, NAACL.

[11]  John DeNero,et al.  Discriminative Modeling of Extraction Sets for Machine Translation , 2010, ACL.

[12]  Ondrej Bojar,et al.  Quiz-Based Evaluation of Machine Translation , 2011, Prague Bull. Math. Linguistics.

[13]  Claire Grover,et al.  In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC , 2006 .

[14]  José B. Mariño,et al.  Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output , 2006, WMT@HLT-NAACL.

[15]  Jörg Tiedemann Word to word alignment strategies , 2004, COLING.

[16]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[17]  Ondrej Bojar,et al.  Czech-English Word Alignment , 2006, LREC.

[18]  José B. Mariño,et al.  System Combination for Machine Translation of Spoken and Written Language , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[20]  Alexandra Birch,et al.  Metrics for MT evaluation: evaluating reordering , 2010, Machine Translation.

[21]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.