Training Paradigms for Correcting Errors in Grammar and Usage

This paper proposes a novel approach to the problem of training classifiers to detect and correct grammar and usage errors in text by selectively introducing mistakes into the training data. When training a classifier, we would like the distribution of examples seen in training to be as similar as possible to the one seen in testing. In error correction problems, such as correcting mistakes made by second language learners, a system is generally trained on correct data, since annotating data for training is expensive. Error generation methods avoid expensive data annotation and create training data that resemble non-native data with errors. We apply error generation methods and train classifiers for detecting and correcting article errors in essays written by non-native English speakers; we show that training on data that contain errors produces higher accuracy when compared to a system that is trained on clean native data. We propose several training paradigms with error generation and show that each such paradigm is superior to training a classifier on native data. We also show that the most successful error generation methods are those that use knowledge about the article distribution and error patterns observed in non-native text.

[1]  Francis Bond,et al.  Memory-Based Learning for Article Generation , 2000, CoNLL/LLL.

[2]  Jennifer Foster,et al.  GenERRate: Generating Errors for Use in Grammatical Error Detection , 2009, BEA@NAACL.

[3]  Martin Chodorow,et al.  Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection , 2008, COLING 2008.

[4]  Jianfeng Gao,et al.  A Web-based English Proofing System for English as a Second Language Users , 2008, IJCNLP.

[5]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[6]  Dan Roth,et al.  Modeling Discriminative Global Inference , 2007, International Conference on Semantic Computing (ICSC 2007).

[7]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[8]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[9]  Dan Roth,et al.  Applying Winnow to Context-Sensitive Spelling Correction , 1996, ICML.

[10]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[11]  Dan Roth,et al.  Scaling Up Context-Sensitive Text Correction , 2001, IAAI.

[12]  Ola Knutsson,et al.  Faking Errors to Avoid Making Errors: Very Weakly Supervised Learning for Error Detection in Writing , 2005 .

[13]  N. A-R A E H A N,et al.  Detecting errors in English article usage by non-native speakers , 2006 .

[14]  Stephanie Seneff,et al.  An analysis of grammatical errors in non-native speech in english , 2008, 2008 IEEE Spoken Language Technology Workshop.

[15]  Hitoshi Isahara,et al.  Automatic Error Detection in the Japanese Learners’ English Spoken Data , 2003, ACL.

[16]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[17]  Kiyotaka Uchimoto,et al.  The NICT JLE Corpus Exploiting the language learners' speech database for research and education , 2004 .

[18]  Erik Smitterberg,et al.  International Corpus of Learner English , 2004 .

[19]  Michael Gamon,et al.  Correcting ESL Errors Using Phrasal SMT Techniques , 2006, ACL.

[20]  Rachele De Felice,et al.  A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English , 2008, COLING.

[21]  Dan Roth,et al.  Annotating ESL Errors: Challenges and Rewards , 2010 .

[22]  Eugene Charniak,et al.  Language Modeling for Determiner Selection , 2007, NAACL.