论文信息 - Detecting Grammatical Errors in Machine Translation Output Using Dependency Parsing and Treebank Querying

Detecting Grammatical Errors in Machine Translation Output Using Dependency Parsing and Treebank Querying

Despite the recent advances in the field of machine translation (MT), MT systems cannot guarantee that the sentences they produce will be fluent and coherent in both syntax and semantics. Detecting and highlighting errors in machine-translated sentences can help post-editors to focus on the erroneous fragments that need to be corrected. This paper presents two methods for detecting grammatical errors in Dutch machine-translated text, using dependency parsing and treebank querying. We test our approach on the output of a statistical and a rule-based MT system for English-Dutch and evaluate the performance on sentence and word-level. The results show that our method can be used to detect grammatical errors with high accuracy on sentence-level in both types of MT output.

Véronique Hoste | Arda Tezcan | Lieve Macken

[1] Arda Tezcan,et al. Post-edited quality, post-editing behaviour and human evaluation: a case study , 2014 .

[2] Veronique Hoste,et al. SCATE Taxonomy and Corpus of Machine Translation Errors , 2016 .

[3] Véronique Hoste,et al. UGENT-LT3 SCATE System for Machine Translation Quality Estimation , 2015, WMT@EMNLP.

[4] Gertjan van Noord,et al. At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[5] Lucia Specia,et al. Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[6] Gertjan van Noord. Robust Parsing of Word Graphs , 2001 .

[7] Gloria Corpas Pastor,et al. Trends in e-tools and resources for translators and interpreters , 2017, The International Journal of Translation and Interpreting Research.

[8] Liesbeth Augustinus,et al. Example-Based Treebank Querying , 2012, LREC.

[9] Christian Hardmeier. Improving Machine Translation Quality Prediction with Syntactic Tree Kernels , 2011, EAMT.

[10] Ineke Schuurman,et al. CGN, an annotated corpus of spoken Dutch , 2003, LINC@EACL.

[11] Hermann Ney,et al. Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.