Detecting Grammatical Errors in Machine Translation Output Using Dependency Parsing and Treebank Querying

Despite the recent advances in the field of machine translation (MT), MT systems cannot guarantee that the sentences they produce will be fluent and coherent in both syntax and semantics. Detecting and highlighting errors in machine-translated sentences can help post-editors to focus on the erroneous fragments that need to be corrected. This paper presents two methods for detecting grammatical errors in Dutch machine-translated text, using dependency parsing and treebank querying. We test our approach on the output of a statistical and a rule-based MT system for English-Dutch and evaluate the performance on sentence and word-level. The results show that our method can be used to detect grammatical errors with high accuracy on sentence-level in both types of MT output.

[1]  Arda Tezcan,et al.  Post-edited quality, post-editing behaviour and human evaluation: a case study , 2014 .

[2]  Veronique Hoste,et al.  SCATE Taxonomy and Corpus of Machine Translation Errors , 2016 .

[3]  Véronique Hoste,et al.  UGENT-LT3 SCATE System for Machine Translation Quality Estimation , 2015, WMT@EMNLP.

[4]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[5]  Lucia Specia,et al.  Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[6]  Gertjan van Noord Robust Parsing of Word Graphs , 2001 .

[7]  Gloria Corpas Pastor,et al.  Trends in e-tools and resources for translators and interpreters , 2017, The International Journal of Translation and Interpreting Research.

[8]  Liesbeth Augustinus,et al.  Example-Based Treebank Querying , 2012, LREC.

[9]  Christian Hardmeier Improving Machine Translation Quality Prediction with Syntactic Tree Kernels , 2011, EAMT.

[10]  Ineke Schuurman,et al.  CGN, an annotated corpus of spoken Dutch , 2003, LINC@EACL.

[11]  Hermann Ney,et al.  Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[12]  Ana Guerberof Arenas Productivity and Quality in MT Post-editing , 2009, MTSUMMIT.

[13]  Robert J. Hartsuiker,et al.  The impact of machine translation error types on post-editing effort indicators , 2015, MTSUMMIT.

[14]  Aravind K. Joshi,et al.  A Formalism for Dependency Grammar Based on Tree Adjoining Grammar , 2003 .

[15]  Orphée De Clercq,et al.  Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus , 2011 .

[16]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[17]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[18]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[19]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[20]  Jean-Claude Junqua,et al.  Robustness in Language and Speech Technology , 2001, Text, Speech and Language Technology.

[21]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[22]  Wei-Yun Ma,et al.  System Combination for Machine Translation Based on Text-to-Text Generation , 2011, MTSUMMIT.

[23]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[24]  Wang Ling,et al.  A linguistically motivated taxonomy for Machine Translation error analysis , 2015, Machine Translation.

[25]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[26]  Sara Stymne,et al.  Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation , 2010, LREC.

[27]  A. Burchardt,et al.  Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics , 2014 .

[28]  Sonia Vandepitte,et al.  Quality as the sum of its parts: a two-step approach for the identification of translation problems and translation quality assessment for HT and MT+PE , 2013, MTSUMMIT.