A Comprehensive Survey of Grammar Error Correction

Grammar error correction (GEC) is an important application aspect of natural language processing techniques. The past decade has witnessed significant progress achieved in GEC for the sake of increasing popularity of machine learning and deep learning, especially in late 2010s when near human-level GEC systems are available. However, there is no prior work focusing on the whole recapitulation of the progress. We present the first survey in GEC for a comprehensive retrospect of the literature in this area. We first give the introduction of five public datasets, data annotation schema, two important shared tasks and four standard evaluation metrics. More importantly, we discuss four kinds of basic approaches, including statistical machine translation based approach, neural machine translation based approach, classification based approach and language model based approach, six commonly applied performance boosting techniques for GEC systems and two data augmentation methods. Since GEC is typically viewed as a sister task of machine translation, many GEC systems are based on neural machine translation (NMT) approaches, where the neural sequence-to-sequence model is applied. Similarly, some performance boosting techniques are adapted from machine translation and are successfully combined with GEC systems for enhancement on the final performance. Furthermore, we conduct an analysis in level of basic approaches, performance boosting techniques and integrated GEC systems based on their experiment results respectively for more clear patterns and conclusions. Finally, we discuss five prospective directions for future GEC researches.

[1]  Rachele De Felice,et al.  A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English , 2008, COLING.

[2]  Shamil Chollampatt,et al.  Neural Network Translation Models for Grammatical Error Correction , 2016, IJCAI.

[3]  Jianfeng Gao,et al.  A Nested Attention Neural Hybrid Model for Grammatical Error Correction , 2017, ACL.

[4]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[5]  Dan Roth,et al.  Annotating ESL Errors: Challenges and Rewards , 2010 .

[6]  Shamil Chollampatt,et al.  A Reassessment of Reference-Based Grammatical Error Correction Metrics , 2018, COLING.

[7]  Adam Kilgarriff,et al.  Helping Our Own: The HOO 2011 Pilot Shared Task , 2011, ENLG.

[8]  Lior Rokach,et al.  Choosing the right word: Using bidirectional LSTM tagger for writing support systems , 2019, Eng. Appl. Artif. Intell..

[9]  Kentaro Inui,et al.  Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough? , 2019, NAACL.

[10]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[11]  D Nicholls,et al.  The Cambridge Learner Corpus-Error coding and analysis , 1999 .

[12]  Anders Søgaard,et al.  Noisy Channel for Low Resource Grammatical Error Correction , 2019, BEA@ACL.

[13]  Ondrej Bojar,et al.  Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.

[14]  Marcin Junczys-Dowmunt,et al.  Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task , 2018, NAACL.

[15]  Lior Rokach,et al.  Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems , 2019, Eng. Appl. Artif. Intell..

[16]  Milan Straka,et al.  CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction , 2019, BEA@ACL.

[17]  Yuji Matsumoto,et al.  Tense and Aspect Error Correction for ESL Learners Using Global Context , 2012, ACL.

[18]  Marcin Junczys-Dowmunt,et al.  Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation , 2018, NAACL.

[19]  Liner Yang,et al.  The BLCU System in the BEA 2019 Shared Task , 2019, BEA@ACL.

[20]  Na-Rae Han,et al.  Detection of Grammatical Errors Involving Prepositions , 2007, ACL 2007.

[21]  Ruobing Li,et al.  The LAIX Systems in the BEA-2019 GEC Shared Task , 2019, BEA@ACL.

[22]  Daniel Jurafsky,et al.  Neural Language Correction with Character-Based Attention , 2016, ArXiv.

[23]  Xiaoqiang Jin,et al.  Convolutional Neural Networks for Correcting English Article Errors , 2015, NLPCC.

[24]  Sunita Sarawagi,et al.  Parallel Iterative Edit Models for Local Sequence Transduction , 2019, EMNLP.

[25]  Zheng Yuan,et al.  Constrained Grammatical Error Correction using Statistical Machine Translation , 2013, CoNLL Shared Task.

[26]  Orhan Firat,et al.  Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[27]  Ming Zhou,et al.  Fluency Boost Learning and Inference for Neural Grammatical Error Correction , 2018, ACL.

[28]  Martin Chodorow,et al.  Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection , 2008, COLING 2008.

[29]  Nadir Durrani,et al.  Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT? , 2013, ACL.

[30]  Ted Briscoe,et al.  Language Model Based Grammatical Error Correction without Annotated Training Data , 2018, BEA@NAACL-HLT.

[31]  Ted Briscoe,et al.  The BEA-2019 Shared Task on Grammatical Error Correction , 2019, BEA@ACL.

[32]  Joel Tetreault,et al.  The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction , 2019, BEA@ACL.

[33]  Omri Abend,et al.  Reference-less Measure of Faithfulness for Grammatical Error Correction , 2018, NAACL.

[34]  Hwee Tou Ng,et al.  A Beam-Search Decoder for Grammatical Error Correction , 2012, EMNLP.

[35]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[36]  Matt Post,et al.  Grammatical Error Correction with Neural Reinforcement Learning , 2017, IJCNLP.

[37]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[38]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[39]  Helen Yannakoudakis,et al.  Neural and FST-based approaches to grammatical error correction , 2019, BEA@ACL.

[40]  Hitoshi Isahara,et al.  Automatic Error Detection in the Japanese Learners’ English Spoken Data , 2003, ACL.

[41]  Helen Yannakoudakis,et al.  Grammatical error correction using hybrid systems and type filtering , 2014, CoNLL Shared Task.

[42]  Shankar Kumar,et al.  Weakly Supervised Grammatical Error Correction using Iterative Decoding , 2018, ArXiv.

[43]  Long Qin,et al.  Erroneous data generation for Grammatical Error Correction , 2019, BEA@ACL.

[44]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[45]  Mamoru Komachi,et al.  TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track , 2019, BEA@ACL.

[46]  Quoc V. Le,et al.  Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[47]  Nitin Madnani,et al.  Bucking the trend: improved evaluation and annotation practices for ESL error detection systems , 2014, Lang. Resour. Evaluation.

[48]  Claudia Leacock,et al.  Automated Grammatical Error Correction for Language Learners , 2010, COLING.

[49]  Shamil Chollampatt,et al.  Neural Quality Estimation of Grammatical Error Correction , 2018, EMNLP.

[50]  FailiHeshaam,et al.  Grammatical and context-sensitive error correction using a statistical machine translation framework , 2013 .

[51]  Hui Lin,et al.  Deep Context Model for Grammatical Error Correction , 2017, SLaTE.

[52]  Daniel Jurafsky,et al.  Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction , 2018, NAACL.

[53]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[54]  Jungyeul Park,et al.  Improving Precision of Grammatical Error Correction with a Cheat Sheet , 2019, BEA@ACL.

[55]  Dan Roth,et al.  Training Paradigms for Correcting Errors in Grammar and Usage , 2010, NAACL.

[56]  Hiroki Asano,et al.  The AIP-Tohoku System at the BEA-2019 Shared Task , 2019, BEA@ACL.

[57]  Dan Roth,et al.  The University of Illinois System in the CoNLL-2013 Shared Task , 2013, CoNLL Shared Task.

[58]  Na-Rae Han,et al.  Detecting errors in English article usage by non-native speakers , 2006, Natural Language Engineering.

[59]  Noam M. Shazeer,et al.  Corpora Generation for Grammatical Error Correction , 2019, NAACL.

[60]  Guodong Zhou,et al.  Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches , 2017, COLING.

[61]  Stephanie Seneff,et al.  Correcting Misuse of Verb Forms , 2008, ACL.

[62]  N. A-R A E H A N,et al.  Detecting errors in English article usage by non-native speakers , 2006 .

[63]  Michael Gamon,et al.  Using Mostly Native Data to Correct Errors in Learners’ Writing , 2010, NAACL.

[64]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[65]  Sebastian Riedel,et al.  Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection , 2018, EMNLP.

[66]  Hwee Tou Ng,et al.  System Combination for Grammatical Error Correction , 2014, EMNLP.

[67]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[68]  Zheng Yuan,et al.  Generating artificial errors for grammatical error correction , 2014, EACL.

[69]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[70]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[71]  Nitin Madnani,et al.  Predicting Grammaticality on an Ordinal Scale , 2014, ACL.

[72]  Ted Briscoe,et al.  Candidate re-ranking for SMT-based grammatical error correction , 2016, BEA@NAACL-HLT.

[73]  Dan Roth,et al.  Grammatical Error Correction: Machine Translation and Classifiers , 2016, ACL.

[74]  Nizar Habash,et al.  The Illinois-Columbia System in the CoNLL-2014 Shared Task , 2014, CoNLL Shared Task.

[75]  Yuji Matsumoto,et al.  Discriminative Reranking for Grammatical Error Correction with Statistical Machine Translation , 2016, NAACL.

[76]  Hwee Tou Ng,et al.  How Far are We from Fully Automatic High Quality Grammatical Error Correction? , 2015, ACL.

[77]  Helen Yannakoudakis,et al.  Neural Sequence-Labelling Models for Grammatical Error Correction , 2017, EMNLP.

[78]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[79]  Shamil Chollampatt,et al.  Connecting the Dots: Towards Human-Level Grammatical Error Correction , 2017, BEA@EMNLP.

[80]  Shamil Chollampatt,et al.  Exploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical Error Correction , 2016, IJCAI.

[81]  Noam Slonim,et al.  Learning to combine Grammatical Error Corrections , 2019, BEA@ACL.

[82]  Martin Chodorow,et al.  Problems in Evaluating Grammatical Error Detection Systems , 2012, COLING.

[83]  Ted Briscoe,et al.  Grammatical error correction using neural machine translation , 2016, NAACL.

[84]  Ari Rappoport,et al.  Universal Conceptual Cognitive Annotation (UCCA) , 2013, ACL.

[85]  Dan Roth,et al.  Generating Confusion Sets for Context-Sensitive Error Correction , 2010, EMNLP.

[86]  Joel R. Tetreault,et al.  Personalizing Grammatical Error Correction: Adaptation to Proficiency Level and L1 , 2019, EMNLP.

[87]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[88]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[89]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[90]  Kentaro Inui,et al.  Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems , 2017, IJCNLP.

[91]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[92]  Michael Gamon,et al.  Correcting ESL Errors Using Phrasal SMT Techniques , 2006, ACL.

[93]  Yang Xiang,et al.  A Hybrid Model For Grammatical Error Correction , 2013, CoNLL Shared Task.

[94]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[95]  Pushpak Bhattacharyya,et al.  Automated Grammar Correction Using Hierarchical Phrase-Based Statistical Machine Translation , 2013, IJCNLP.

[96]  Bill Byrne,et al.  The CUED's Grammatical Error Correction Systems for BEA-2019 , 2019, BEA@ACL.

[97]  Yo Joong Choe,et al.  A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning , 2019, BEA@ACL.

[98]  Jennifer Foster,et al.  Using Parse Features for Preposition Selection and Error Detection , 2010, ACL.

[99]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[100]  Grigori Sidorov,et al.  Rule-based System for Automatic Grammar Correction Using Syntactic N-grams for English Language Learning (L2) , 2013, CoNLL Shared Task.

[101]  Robert Dale,et al.  HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task , 2012, BEA@NAACL-HLT.

[102]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[103]  Nitin Madnani,et al.  They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems , 2011, ACL.

[104]  Shamil Chollampatt,et al.  Adapting Grammatical Error Correction Based on the Native Language of Writers with Neural Network Joint Models , 2016, EMNLP.

[105]  Marcin Junczys-Dowmunt,et al.  Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction , 2016, EMNLP.

[106]  Joel R. Tetreault,et al.  There’s No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction , 2016, EMNLP.

[107]  Jianfeng Gao,et al.  Using Statistical Techniques and Web Search to Correct ESL Errors , 2013 .

[108]  Kentaro Inui,et al.  An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction , 2019, EMNLP.

[109]  Joel R. Tetreault,et al.  GEC into the future: Where are we going and how do we get there? , 2017, BEA@EMNLP.

[110]  Jennifer Foster,et al.  GenERRate: Generating Errors for Use in Grammatical Error Detection , 2009, BEA@NAACL.

[111]  H. Ng,et al.  A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction , 2018, AAAI.

[112]  Zheng Yuan,et al.  Grammatical error correction in non-native English , 2017 .

[113]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[114]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[115]  Shamil Chollampatt,et al.  Cross-Sentence Grammatical Error Correction , 2019, ACL.

[116]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[117]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[118]  Yuji Matsumoto,et al.  The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings , 2012, COLING.

[119]  Ted Briscoe,et al.  Towards a standard evaluation method for grammatical error detection and correction , 2015, NAACL.

[120]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[121]  Marcin Junczys-Dowmunt,et al.  Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data , 2019, BEA@ACL.

[122]  Alon Lavie,et al.  Combining Machine Translation Output with Open Source: The Carnegie Mellon Multi-Engine Machine Translation Scheme , 2010, Prague Bull. Math. Linguistics.

[123]  Marcin Junczys-Dowmunt,et al.  The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation , 2014, CoNLL Shared Task.

[124]  Bill Byrne,et al.  Neural Grammatical Error Correction with Finite State Transducers , 2019, NAACL.

[125]  Dan Roth,et al.  The UI System in the HOO 2012 Shared Task on Error Correction , 2012, BEA@NAACL-HLT.

[126]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[127]  Ted Briscoe,et al.  Artificial Error Generation with Machine Translation and Syntactic Patterns , 2017, BEA@EMNLP.

[128]  Yuji Matsumoto,et al.  NAIST at 2013 CoNLL Grammatical Error Correction Shared Task , 2013, CoNLL Shared Task.

[129]  Omri Abend,et al.  Inherent Biases in Reference-based Evaluation for Grammatical Error Correction , 2018, ACL.

[130]  Matt Post,et al.  Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality , 2016, TACL.

[131]  Aliaksei Severyn,et al.  Encode, Tag, Realize: High-Precision Text Editing , 2019, EMNLP.

[132]  Hui Lin,et al.  A Simple but Effective Classification Model for Grammatical Error Correction , 2018, ArXiv.