Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors

In this paper, we present two dependency parser training methods appropriate for parsing outputs of statistical machine translation (SMT), which pose problems to standard parsers due to their frequent ungrammaticality. We adapt the MST parser by exploiting additional features from the source language, and by introducing artificial grammatical errors in the parser training data, so that the training sentences resemble SMT output. We evaluate the modified parser on DEPFIX, a system that improves English-Czech SMT outputs using automatic rule-based corrections of grammatical mistakes which requires parsed SMT output sentences as its input. Both parser modifications led to improvements in BLEU score; their combination was evaluated manually, showing a statistically significant improvement of the translation quality.

[1]  Ondrej Bojar,et al.  Failures in English-Czech Phrase-Based MT ∗ , 2010 .

[2]  Jan Hajic Disambiguation of Rich Inflection - Computational Morphology of Czech , 2004 .

[3]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[4]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[5]  Sara Stymne,et al.  Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation , 2010, LREC.

[6]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[7]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[8]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[11]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[12]  Khalid Choukri,et al.  The european language resources association , 1998, LREC.

[13]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[14]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[15]  Haizhou Li,et al.  Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.

[16]  Kentaro Torisawa,et al.  Bitext Dependency Parsing with Bilingual Subtree Constraints , 2010, ACL.

[17]  Shay B. Cohen,et al.  Discriminative Online Algorithms for Sequence Labeling-A Comparative Study , 2007 .

[18]  Martin Wittorff Haulrich Data-Driven Bitext Dependency Parsing and Alignment , 2012 .

[19]  Ondrej Bojar,et al.  2010 Failures in English-Czech Phrase-Based MT , 2010, WMT@ACL.

[20]  Hai Zhao,et al.  Cross Language Dependency Parsing using a Bilingual Lexicon , 2009, ACL.

[21]  Dan Klein,et al.  Joint Parsing and Alignment with Weakly Synchronized Grammars , 2010, NAACL.

[22]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[23]  Jan Hajič,et al.  The Best of Two Worlds: Cooperation of Statistical and Rule-Based Taggers for Czech , 2007, ACL 2007.

[24]  Rudolf Rosa,et al.  Two-step translation with grammatical post-processing , 2011, WMT@EMNLP.

[25]  Haizhou Li,et al.  SMT Helps Bitext Dependency Parsing , 2011, EMNLP.

[26]  Josef van Genabith,et al.  Adapting a WSJ-Trained Parser to Grammatically Noisy Text , 2008, ACL.

[27]  Qun Liu,et al.  Bilingually-Constrained (Monolingual) Shift-Reduce Parsing , 2009, EMNLP.

[28]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[29]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[30]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[31]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[32]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[33]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[34]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[35]  Zdenek Zabokrtský,et al.  TectoMT: Modular NLP Framework , 2010, IceTAL.

[36]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.