Automated Grammar Correction Using Hierarchical Phrase-Based Statistical Machine Translation

We introduce a novel technique that uses hierarchical phrase-based statistical machine translation (SMT) for grammar correction. SMT systems provide a uniform platform for any sequence transformation task. Thus grammar correction can be considered a translation problem from incorrect text to correct text. Over the years, grammar correction data in the electronic form (i.e., parallel corpora of incorrect and correct sentences) has increased manifolds in quality and quantity, making SMT systems feasible for grammar correction. Firstly, sophisticated translation models like hierarchical phrase-based SMT can handle errors as complicated as reordering or insertion, which were difficult to deal with previously throuh the mediation of rule based systems. Secondly, this SMT based correction technique is similar in spirit to human correction, because the system extracts grammar rules from the corpus and later uses these rules to translate incorrect sentences to correct sentences. We describe how to use Joshua, a hierarchical phrase-based SMT system for grammar correction. An accuracy of 0.77 (BLEU score) establishes the efficacy of our approach.

[1]  Matt Post,et al.  Joshua 4.0: Packing, PRO, and Paraphrases , 2012, WMT@NAACL-HLT.

[2]  Rachele De Felice,et al.  A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English , 2008, COLING.

[3]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[4]  Martin Chodorow,et al.  An Unsupervised Method for Detecting Grammatical Errors , 2000, ANLP.

[5]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[6]  Omar Zaidan,et al.  Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[7]  Alon Lavie,et al.  Proceedings of the Sixth Workshop on Statistical Machine Translation , 2011 .

[8]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[9]  Matthieu Hermet,et al.  Using First and Second Language Models to Correct Preposition Errors in Second Language Authoring , 2009, BEA@NAACL.

[10]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[11]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[12]  Yuji Matsumoto,et al.  The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings , 2012, COLING.

[13]  Hitoshi Isahara,et al.  Automatic Error Detection in the Japanese Learners’ English Spoken Data , 2003, ACL.

[14]  Chris Callison-Burch,et al.  Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies , 2010, WMT@ACL.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[17]  Roger Levy,et al.  Automated Whole Sentence Grammar Correction Using a Noisy Channel Model , 2011, ACL.

[18]  Michael Gamon,et al.  Correcting ESL Errors Using Phrasal SMT Techniques , 2006, ACL.