Personalizing Grammatical Error Correction: Adaptation to Proficiency Level and L1

Grammar error correction (GEC) systems have become ubiquitous in a variety of software applications, and have started to approach human-level performance for some datasets. However, very little is known about how to efficiently personalize these systems to the user’s characteristics, such as their proficiency level and first language, or to emerging domains of text. We present the first results on adapting a general purpose neural GEC system to both the proficiency level and the first language of a writer, using only a few thousand annotated sentences. Our study is the broadest of its kind, covering five proficiency levels and twelve different languages, and comparing three different adaptation scenarios: adapting to the proficiency level only, to the first language only, or to both aspects simultaneously. We show that tailoring to both scenarios achieves the largest performance improvement (3.6 F0.5) relative to a strong baseline.

[1]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[2]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[3]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[4]  Shamil Chollampatt,et al.  Neural Quality Estimation of Grammatical Error Correction , 2018, EMNLP.

[5]  Dan Roth,et al.  Adapting to Learner Errors with Minimal Supervision , 2017, CL.

[6]  Marcin Junczys-Dowmunt,et al.  Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data , 2019, BEA@ACL.

[7]  Shamil Chollampatt,et al.  Adapting Grammatical Error Correction Based on the Native Language of Writers with Neural Network Joint Models , 2016, EMNLP.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Huda Khayrallah,et al.  Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation , 2018, WMT.

[10]  Anj Foley,et al.  Learner English: A Teacher's Guide to Interference and Other Problems Second Edition [Book Review] , 2002 .

[11]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[12]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[13]  Wei Zhao,et al.  Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data , 2019, NAACL.

[14]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[15]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[16]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[17]  D Nicholls,et al.  The Cambridge Learner Corpus-Error coding and analysis , 1999 .

[18]  Dan Roth,et al.  Algorithm Selection and Model Adaptation for ESL Correction Tasks , 2011, ACL.

[19]  Marcin Junczys-Dowmunt,et al.  Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation , 2018, NAACL.

[20]  Rico Sennrich,et al.  Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.

[21]  Marcin Junczys-Dowmunt,et al.  Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task , 2018, NAACL.

[22]  Helen Yannakoudakis,et al.  Compositional Sequence Labeling Models for Error Detection in Learner Writing , 2016, ACL.

[23]  Yuji Matsumoto,et al.  Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners , 2011, IJCNLP.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Alexander M. Rush,et al.  OpenNMT: Neural Machine Translation Toolkit , 2018, AMTA.

[26]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.