Erroneous data generation for Grammatical Error Correction

It has been demonstrated that the utilization of a monolingual corpus in neural Grammatical Error Correction (GEC) systems can significantly improve the system performance. The previous state-of-the-art neural GEC system is an ensemble of four Transformer models pretrained on a large amount of Wikipedia Edits. The Singsound GEC system follows a similar approach but is equipped with a sophisticated erroneous data generating component. Our system achieved an F0:5 of 66.61 in the BEA 2019 Shared Task: Grammatical Error Correction. With our novel erroneous data generating component, the Singsound neural GEC system yielded an M2 of 63.2 on the CoNLL-2014 benchmark (8.4% relative improvement over the previous state-of-the-art system).

[1]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[2]  Sylviane Granger The computer learner corpus: a versatile new source of data for SLA research: Sylviane Granger , 2014 .

[3]  Ted Briscoe,et al.  The BEA-2019 Shared Task on Grammatical Error Correction , 2019, BEA@ACL.

[4]  Matt Post,et al.  Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality , 2016, TACL.

[5]  Shankar Kumar,et al.  Weakly Supervised Grammatical Error Correction using Iterative Decoding , 2018, ArXiv.

[6]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[7]  Marcin Junczys-Dowmunt,et al.  Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction , 2016, EMNLP.

[8]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[9]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[10]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[11]  Sylviane Granger,et al.  The computer learner corpus: a versatile new source of data for SLA research , 1998 .

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[14]  Yuji Matsumoto,et al.  Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners , 2011, IJCNLP.

[15]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[16]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[17]  Marcin Junczys-Dowmunt,et al.  Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation , 2018, NAACL.

[18]  Ming Zhou,et al.  Fluency Boost Learning and Inference for Neural Grammatical Error Correction , 2018, ACL.

[19]  H. Ng,et al.  A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction , 2018, AAAI.

[20]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[21]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[22]  Hwee Tou Ng,et al.  How Far are We from Fully Automatic High Quality Grammatical Error Correction? , 2015, ACL.

[23]  Marcin Junczys-Dowmunt,et al.  Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task , 2018, NAACL.

[24]  Yuji Matsumoto,et al.  Tense and Aspect Error Correction for ESL Learners Using Global Context , 2012, ACL.

[25]  Shamil Chollampatt,et al.  Connecting the Dots: Towards Human-Level Grammatical Error Correction , 2017, BEA@EMNLP.