Ling@CASS Solution to the NLP-TEA CGED Shared Task 2018

In this study, we employ the sequence to sequence learning to model the task of grammar error correction. The system takes potentially erroneous sentences as inputs, and outputs correct sentences. To breakthrough the bottlenecks of very limited size of manually labeled data, we adopt a semi-supervised approach. Specifically, we adapt correct sentences written by native Chinese speakers to generate pseudo grammatical errors made by learners of Chinese as a second language. We use the pseudo data to pre-train the model, and the CGED data to fine-tune it. Being aware of the significance of precision in a grammar error correction system in real scenarios, we use ensembles to boost precision. When using inputs as simple as Chinese characters, the ensembled system achieves a precision at 86.56% in the detection of erroneous sentences, and a precision at 51.53% in the correction of errors of Selection and Missing types.

[1]  Zheng Yuan,et al.  Constrained Grammatical Error Correction using Statistical Machine Translation , 2013, CoNLL Shared Task.

[2]  Hsin-Hsi Chen,et al.  Detecting Word Ordering Errors in Chinese Sentences for Learning Chinese as a Foreign Language , 2012, COLING.

[3]  Jennifer Foster,et al.  GenERRate: Generating Errors for Use in Grammatical Error Detection , 2009, BEA@NAACL.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Hsin-Hsi Chen,et al.  Linguistic rules based Chinese error detection for second language learning , 2013, ICCE 2013.

[6]  Kevin Duh,et al.  Robsut Wrod Reocginiton via Semi-Character Recurrent Neural Network , 2016, AAAI.

[7]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[8]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[9]  Dan Roth,et al.  Training Paradigms for Correcting Errors in Grammar and Usage , 2010, NAACL.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Alexander M. Rush,et al.  Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction , 2016, BEA@NAACL-HLT.

[12]  Wanxiang Che,et al.  LTP: A Chinese Language Technology Platform , 2010, COLING.

[13]  Markus Dickinson,et al.  Generating Learner-Like Morphological Errors in Russian , 2010, COLING.

[14]  Marcin Junczys-Dowmunt,et al.  The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction , 2014, PolTAL.

[15]  Dan Roth,et al.  Adapting to Learner Errors with Minimal Supervision , 2017, CL.

[16]  Kugatsu Sadamitsu,et al.  Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation , 2012, ACL.

[17]  Hsin-Hsi Chen,et al.  Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners , 2014, COLING.

[18]  Yuen-Hsien Tseng,et al.  A Sentence Judgment System for Grammatical Error Detection , 2014, COLING.

[19]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[20]  Pushpak Bhattacharyya,et al.  Grammatical Error Correction , 2017 .

[21]  Zheng Yuan,et al.  Generating artificial errors for grammatical error correction , 2014, EACL.

[22]  Gregory V. Bard,et al.  Spelling-Error Tolerant, Order-Independent Pass-Phrases via the Damerau-Levenshtein String-Edit Distance Metric , 2007, ACSW.

[23]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[24]  Wanxiang Che,et al.  Chinese Grammatical Error Diagnosis with Long Short-Term Memory Networks , 2016, NLP-TEA@COLING.

[25]  Nitin Madnani,et al.  Robust Systems for Preposition Error Correction Using Wikipedia Revisions , 2013, NAACL.

[26]  Yi Yang,et al.  Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task , 2017, IJCNLP.