Minimally-Augmented Grammatical Error Correction

There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique

[1]  Marcin Junczys-Dowmunt,et al.  Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data , 2019, BEA@ACL.

[2]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[3]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[4]  Ted Briscoe,et al.  The BEA-2019 Shared Task on Grammatical Error Correction , 2019, BEA@ACL.

[5]  Joel Tetreault,et al.  The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction , 2019, BEA@ACL.

[6]  Ming Zhou,et al.  Fluency Boost Learning and Inference for Neural Grammatical Error Correction , 2018, ACL.

[7]  Ilya Segalovich,et al.  A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine , 2003, MLMTA.

[8]  Mamoru Komachi,et al.  Controlling Grammatical Error Correction Using Word Edit Rate , 2019, ACL.

[9]  Ted Briscoe,et al.  Language Model Based Grammatical Error Correction without Annotated Training Data , 2018, BEA@NAACL-HLT.

[10]  Adriane Boyd,et al.  Using Wikipedia Edits in Low Resource Grammatical Error Correction , 2018, NUT@EMNLP.

[11]  Dan Roth,et al.  Adapting to Learner Errors with Minimal Supervision , 2017, CL.

[12]  Sebastian Riedel,et al.  Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection , 2018, EMNLP.

[13]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[14]  Dan Roth,et al.  Generating Confusion Sets for Context-Sensitive Error Correction , 2010, EMNLP.

[15]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[18]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[19]  Zheng Yuan,et al.  Generating artificial errors for grammatical error correction , 2014, EACL.

[20]  Daniel Jurafsky,et al.  Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction , 2018, NAACL.

[21]  Bill Byrne,et al.  Neural Grammatical Error Correction with Finite State Transducers , 2019, NAACL.

[22]  Ted Briscoe,et al.  Artificial Error Generation with Machine Translation and Syntactic Patterns , 2017, BEA@EMNLP.

[23]  Sylviane Granger The computer learner corpus: a versatile new source of data for SLA research: Sylviane Granger , 2014 .

[24]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[25]  Wei Zhao,et al.  Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data , 2019, NAACL.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Shankar Kumar,et al.  Weakly Supervised Grammatical Error Correction using Iterative Decoding , 2018, ArXiv.

[28]  Marcin Junczys-Dowmunt,et al.  The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction , 2014, PolTAL.

[29]  Sylviane Granger,et al.  The computer learner corpus: a versatile new source of data for SLA research , 1998 .

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[32]  Rico Sennrich,et al.  Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.

[33]  Marcin Junczys-Dowmunt,et al.  Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task , 2018, NAACL.

[34]  Dan Roth,et al.  Grammar Error Correction in Morphologically Rich Languages: The Case of Russian , 2019, TACL.