Generating Confusion Sets for Context-Sensitive Error Correction

In this paper, we consider the problem of generating candidate corrections for the task of correcting errors in text. We focus on the task of correcting errors in preposition usage made by non-native English speakers, using discriminative classifiers. The standard approach to the problem assumes that the set of candidate corrections for a preposition consists of all preposition choices participating in the task. We determine likely preposition confusions using an annotated corpus of non-native text and use this knowledge to produce smaller sets of candidates. We propose several methods of restricting candidate sets. These methods exclude candidate prepositions that are not observed as valid corrections in the annotated corpus and take into account the likelihood of each preposition confusion in the non-native text. We find that restricting candidates to those that are observed in the non-native data improves both the precision and the recall compared to the approach that views all prepositions as possible candidates. Furthermore, the approach that takes into account the likelihood of each preposition confusion is shown to be the most effective.

[1]  Dan Roth,et al.  Training Paradigms for Correcting Errors in Grammar and Usage , 2010, NAACL.

[2]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[3]  Dan Roth,et al.  Annotating ESL Errors: Challenges and Rewards , 2010 .

[4]  Stephen G. Pulman,et al.  Automatically Acquiring Models of Preposition Use , 2007, ACL 2007.

[5]  Na-Rae Han,et al.  Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System , 2010, LREC.

[6]  Dan Roth,et al.  Scaling Up Context-Sensitive Text Correction , 2001, IAAI.

[7]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[8]  Jennifer Foster,et al.  Using Parse Features for Preposition Selection and Error Detection , 2010, ACL.

[9]  Walt Detmar Meurers,et al.  Exploring the Data-Driven Prediction of Prepositions in English , 2010, COLING.

[10]  RothDan,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1999 .

[11]  Erik Smitterberg,et al.  International Corpus of Learner English , 2004 .

[12]  John Bitchener,et al.  The Effect of Different Types of Corrective Feedback on ESL Student Writing. , 2005 .

[13]  Andrew Carlson,et al.  Memory-based context-sensitive spelling correction at web scale , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[14]  Rachele De Felice,et al.  A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English , 2008, COLING.

[15]  Sylviane Granger,et al.  The International Corpus of Learner English , 1993 .

[16]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[17]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[18]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[19]  Andrew Carlson,et al.  Memory-based context-sensitive spelling correction at web scale , 2007, ICMLA 2007.

[20]  Michael Gamon,et al.  Using Mostly Native Data to Correct Errors in Learners’ Writing , 2010, NAACL.

[21]  N. A-R A E H A N,et al.  Detecting errors in English article usage by non-native speakers , 2006 .

[22]  Stephanie Seneff,et al.  An analysis of grammatical errors in non-native speech in english , 2008, 2008 IEEE Spoken Language Technology Workshop.

[23]  Gerard M. Dalgish Computer-Assisted ESL Research. , 1984 .

[24]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[25]  Na-Rae Han,et al.  Detection of Grammatical Errors Involving Prepositions , 2007, ACL 2007.

[26]  Hitoshi Isahara,et al.  Automatic Error Detection in the Japanese Learners’ English Spoken Data , 2003, ACL.

[27]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[28]  Dan Roth,et al.  Modeling Discriminative Global Inference , 2007, International Conference on Semantic Computing (ICSC 2007).

[29]  Jens Eeg-Olofsson,et al.  Automatic Grammar Checking for Second Language Learners – the Use of Prepositions , 2003 .