An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction

The incorporation of pseudo data in the training of grammatical error correction models has been one of the main factors in improving the performance of such models. However, consensus is lacking on experimental configurations, namely, choosing how the pseudo data should be generated or used. In this study, these choices are investigated through extensive experiments, and state-of-the-art performance is achieved on the CoNLL-2014 test set (F0.5=65.0) and the official test set of the BEA-2019 shared task (F0.5=70.2) without making any modifications to the model architecture.

[1]  Marcin Junczys-Dowmunt,et al.  Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data , 2019, BEA@ACL.

[2]  Yuji Matsumoto,et al.  Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners , 2011, IJCNLP.

[3]  Jianfeng Gao,et al.  A Nested Attention Neural Hybrid Model for Grammatical Error Correction , 2017, ACL.

[4]  Noam M. Shazeer,et al.  Corpora Generation for Grammatical Error Correction , 2019, NAACL.

[5]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[6]  Helen Yannakoudakis,et al.  Developing an automated writing placement system for ESL learners , 2018 .

[7]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[8]  Marcin Junczys-Dowmunt,et al.  Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation , 2018, NAACL.

[9]  Matt Post,et al.  GLEU Without Tuning , 2016, ArXiv.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  H. Ng,et al.  A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction , 2018, AAAI.

[12]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[13]  Ted Briscoe,et al.  Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments , 2016, COLING.

[14]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[15]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[18]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[19]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[20]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[21]  Ming Zhou,et al.  Fluency Boost Learning and Inference for Neural Grammatical Error Correction , 2018, ACL.

[22]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[23]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yuji Matsumoto,et al.  Tense and Aspect Error Correction for ESL Learners Using Global Context , 2012, ACL.

[26]  Sylviane Granger,et al.  The computer learner corpus: a versatile new source of data for SLA research , 1998 .

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[29]  Sylviane Granger The computer learner corpus: a versatile new source of data for SLA research: Sylviane Granger , 2014 .

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Daniel Jurafsky,et al.  Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction , 2018, NAACL.

[32]  Rico Sennrich,et al.  The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.

[33]  Wei Zhao,et al.  Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data , 2019, NAACL.

[34]  Noam Shazeer,et al.  Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[35]  Hiroki Asano,et al.  The AIP-Tohoku System at the BEA-2019 Shared Task , 2019, BEA@ACL.

[36]  Marcin Junczys-Dowmunt,et al.  Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task , 2018, NAACL.

[37]  Matt Post,et al.  Ground Truth for Grammatical Error Correction Metrics , 2015, ACL.