Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

This paper investigates how to effectively incorporate a pre-trained masked language model (MLM), such as BERT, into an encoder-decoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that of the corpora used for pre-training MLMs; however, this issue is not addressed in the previous methods. Our experiments show that our proposed method, where we first fine-tune a MLM with a given GEC corpus and then use the output of the fine-tuned MLM as additional features in the GEC model, maximizes the benefit of the MLM. The best-performing model achieves state-of-the-art performances on the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at: this https URL.

[1]  Noam M. Shazeer,et al.  Corpora Generation for Grammatical Error Correction , 2019, NAACL.

[2]  Yuji Matsumoto,et al.  Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners , 2011, IJCNLP.

[3]  Noam Slonim,et al.  Learning to combine Grammatical Error Corrections , 2019, BEA@ACL.

[4]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[5]  Shujian Huang,et al.  Acquiring Knowledge from Pre-trained Model to Neural Machine Translation , 2019, AAAI.

[6]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[7]  Helen Yannakoudakis,et al.  Context is Key: Grammatical Error Detection with Contextual Word Representations , 2019, BEA@ACL.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[10]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[11]  Hiroki Asano,et al.  The AIP-Tohoku System at the BEA-2019 Shared Task , 2019, BEA@ACL.

[12]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[13]  Helen Yannakoudakis,et al.  Developing an automated writing placement system for ESL learners , 2018 .

[14]  Mamoru Komachi,et al.  TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track , 2019, BEA@ACL.

[15]  Shamil Chollampatt,et al.  Cross-Sentence Grammatical Error Correction , 2019, ACL.

[16]  Marcin Junczys-Dowmunt,et al.  Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data , 2019, BEA@ACL.

[17]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[18]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[19]  Ted Briscoe,et al.  Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments , 2016, COLING.

[20]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[21]  Mamoru Komachi,et al.  Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings , 2017, IJCNLP.

[22]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kentaro Inui,et al.  An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction , 2019, EMNLP.

[24]  Helen Yannakoudakis,et al.  Compositional Sequence Labeling Models for Error Detection in Learner Writing , 2016, ACL.

[25]  Sunita Sarawagi,et al.  Parallel Iterative Edit Models for Local Sequence Transduction , 2019, EMNLP.

[26]  Ted Briscoe,et al.  The BEA-2019 Shared Task on Grammatical Error Correction , 2019, BEA@ACL.

[27]  Tie-Yan Liu,et al.  Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[28]  Xuanjing Huang,et al.  Pre-trained Models for Natural Language Processing: A Survey , 2020, ArXiv.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[31]  Daniel Jurafsky,et al.  Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction , 2018, NAACL.

[32]  Ji Wang,et al.  Pretraining-Based Natural Language Generation for Text Summarization , 2019, CoNLL.

[33]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[34]  Mamoru Komachi,et al.  Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection , 2019, Computación y Sistemas.

[35]  Wei Zhao,et al.  Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data , 2019, NAACL.

[36]  Kentaro Inui,et al.  Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough? , 2019, NAACL.

[37]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[38]  Yang Liu,et al.  Fine-tune BERT for Extractive Summarization , 2019, ArXiv.