Connecting the Dots: Towards Human-Level Grammatical Error Correction

We build a grammatical error correction (GEC) system primarily based on the state-of-the-art statistical machine translation (SMT) approach, using task-specific features and tuning, and further enhance it with the modeling power of neural network joint models. The SMT-based system is weak in generalizing beyond patterns seen during training and lacks granularity below the word level. To address this issue, we incorporate a character-level SMT component targeting the misspelled words that the original SMT-based system fails to correct. Our final system achieves 53.14% F 0.5 score on the benchmark CoNLL-2014 test set, an improvement of 3.62% F 0.5 over the best previous published score.

[1]  Shamil Chollampatt,et al.  Adapting Grammatical Error Correction Based on the Native Language of Writers with Neural Network Joint Models , 2016, EMNLP.

[2]  Marcin Junczys-Dowmunt,et al.  Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction , 2016, EMNLP.

[3]  Matt Post,et al.  Ground Truth for Grammatical Error Correction Metrics , 2015, ACL.

[4]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[5]  Nitin Madnani,et al.  Predicting Grammaticality on an Ordinal Scale , 2014, ACL.

[6]  Michael Flor,et al.  On using context for automatic correction of non-word misspellings in student essays , 2012, BEA@NAACL-HLT.

[7]  Yuji Matsumoto,et al.  Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners , 2011, IJCNLP.

[8]  Dan Roth,et al.  Grammatical Error Correction: Machine Translation and Classifiers , 2016, ACL.

[9]  Jörg Tiedemann,et al.  Character-Based PSMT for Closely Related Languages , 2009, EAMT.

[10]  Shamil Chollampatt,et al.  Exploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical Error Correction , 2016, IJCAI.

[11]  Daniel Jurafsky,et al.  Neural Language Correction with Character-Based Attention , 2016, ArXiv.

[12]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[13]  Robert Dale,et al.  HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task , 2012, BEA@NAACL-HLT.

[14]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[15]  Nadir Durrani,et al.  Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT? , 2013, ACL.

[16]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[17]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[18]  Preslav Nakov,et al.  Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages , 2012, ACL.

[19]  José A. R. Fonollosa,et al.  Dealing with Input Noise in Statistical Machine Translation , 2012, COLING.

[20]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[21]  Nadir Durrani,et al.  Integrating an Unsupervised Transliteration Model into Statistical Machine Translation , 2014, EACL.

[22]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[23]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[24]  Ted Briscoe,et al.  Grammatical error correction using neural machine translation , 2016, NAACL.

[25]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[26]  Adam Kilgarriff,et al.  Helping Our Own: The HOO 2011 Pilot Shared Task , 2011, ENLG.

[27]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[28]  Ted Briscoe,et al.  Candidate re-ranking for SMT-based grammatical error correction , 2016, BEA@NAACL-HLT.

[29]  Yuji Matsumoto,et al.  Discriminative Reranking for Grammatical Error Correction with Statistical Machine Translation , 2016, NAACL.

[30]  Hwee Tou Ng,et al.  How Far are We from Fully Automatic High Quality Grammatical Error Correction? , 2015, ACL.

[31]  Shamil Chollampatt,et al.  Neural Network Translation Models for Grammatical Error Correction , 2016, IJCAI.

[32]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[33]  Houda Bouamor,et al.  UMMU$@$QALB-2015 Shared Task: Character and Word level SMT pipeline for Automatic Error Correction of Arabic Text , 2015, ANLP@ACL.

[34]  Michael Gamon,et al.  Correcting ESL Errors Using Phrasal SMT Techniques , 2006, ACL.

[35]  Helen Yannakoudakis,et al.  Grammatical error correction using hybrid systems and type filtering , 2014, CoNLL Shared Task.

[36]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[37]  Hwee Tou Ng,et al.  System Combination for Grammatical Error Correction , 2014, EMNLP.