论文信息 - Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction - 字舞流文

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

This paper investigates how to effectively incorporate a pre-trained masked language model (MLM), such as BERT, into an encoder-decoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that of the corpora used for pre-training MLMs; however, this issue is not addressed in the previous methods. Our experiments show that our proposed method, where we first fine-tune a MLM with a given GEC corpus and then use the output of the fine-tuned MLM as additional features in the GEC model, maximizes the benefit of the MLM. The best-performing model achieves state-of-the-art performances on the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at: this https URL.

Kentaro Inui | Jun Suzuki | Masato Mita | Shun Kiyono | Masahiro Kaneko | Kentaro Inui | Jun Suzuki | Shun Kiyono | Masato Mita | Masahiro Kaneko

[1] Noam M. Shazeer,et al. Corpora Generation for Grammatical Error Correction , 2019, NAACL.

[2] Yuji Matsumoto,et al. Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners , 2011, IJCNLP.

[3] Noam Slonim,et al. Learning to combine Grammatical Error Corrections , 2019, BEA@ACL.

[4] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[5] Shujian Huang,et al. Acquiring Knowledge from Pre-trained Model to Neural Machine Translation , 2019, AAAI.

[6] Helen Yannakoudakis,et al. A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[7] Helen Yannakoudakis,et al. Context is Key: Grammatical Error Detection with Contextual Word Representations , 2019, BEA@ACL.

[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[9] Hwee Tou Ng,et al. Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[10] Rico Sennrich,et al. Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[11] Hiroki Asano,et al. The AIP-Tohoku System at the BEA-2019 Shared Task , 2019, BEA@ACL.

[12] Ted Briscoe,et al. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[13] Helen Yannakoudakis,et al. Developing an automated writing placement system for ESL learners , 2018 .

[14] Mamoru Komachi,et al. TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track , 2019, BEA@ACL.

[15] Shamil Chollampatt,et al. Cross-Sentence Grammatical Error Correction , 2019, ACL.

[16] Marcin Junczys-Dowmunt,et al. Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data , 2019, BEA@ACL.

[17] Raymond Hendy Susanto,et al. The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[18] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[19] Ted Briscoe,et al. Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments , 2016, COLING.

[20] Joel R. Tetreault,et al. JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[21] Mamoru Komachi,et al. Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings , 2017, IJCNLP.

[22] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Kentaro Inui,et al. An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction , 2019, EMNLP.

[24] Helen Yannakoudakis,et al. Compositional Sequence Labeling Models for Error Detection in Learner Writing , 2016, ACL.

[25] Sunita Sarawagi,et al. Parallel Iterative Edit Models for Local Sequence Transduction , 2019, EMNLP.

[26] Ted Briscoe,et al. The BEA-2019 Shared Task on Grammatical Error Correction , 2019, BEA@ACL.

[27] Tie-Yan Liu,et al. Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[28] Xuanjing Huang,et al. Pre-trained Models for Natural Language Processing: A Survey , 2020, ArXiv.

[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[31] Daniel Jurafsky,et al. Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction , 2018, NAACL.

[32] Ji Wang,et al. Pretraining-Based Natural Language Generation for Text Summarization , 2019, CoNLL.

[33] Hwee Tou Ng,et al. Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[34] Mamoru Komachi,et al. Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection , 2019, Computación y Sistemas.

[35] Wei Zhao,et al. Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data , 2019, NAACL.

[36] Kentaro Inui,et al. Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough? , 2019, NAACL.

[37] Shashi Narayan,et al. Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[38] Yang Liu,et al. Fine-tune BERT for Extractive Summarization , 2019, ArXiv.