A Large List of Confusion Sets for Spellchecking Assessed Against a Corpus of Real-word Errors

One of the methods that has been proposed for dealing with real-word errors (errors that occur when a correctly spelled word is substituted for the one intended) is the "confusion-set" approach - a confusion set being a small group of words that are likely to be confused with one another. Using a list of confusion sets drawn up in advance, a spellchecker, on finding one of these words in a text, can assess whether one of the other members of its set would be a better fit and, if it appears to be so, propose that word as a correction. Much of the research using this approach has suffered from two weaknesses. The first is the small number of confusion sets used. The second is that systems have largely been tested on artificial errors. In this paper we address these two weaknesses. We describe the creation of a realistically sized list of confusion sets, then the assembling of a corpus of real-word errors, and then we assess the potential of that list in relation to that corpus.

[1]  Jean Véronis,et al.  Computerized correction of phonographic errors , 1988, Comput. Humanit..

[2]  Andrew R. Golding,et al.  A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[3]  Eric Atwell,et al.  Dealing with ill-formed English text , 1987 .

[4]  Andrew Carlson,et al.  Memory-based context-sensitive spelling correction at web scale , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[5]  Roger Mitton,et al.  English spelling and the computer , 1995 .

[6]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[7]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[8]  Roger Mitton,et al.  Spelling checkers, spelling correctors and the misspellings of poor spellers , 1987, Inf. Process. Manag..

[9]  C M Sterling,et al.  Spelling errors in context. , 1983, British journal of psychology.

[10]  Dan Roth,et al.  Scaling Up Context-Sensitive Text Correction , 2001, IAAI.

[11]  Andrew W. Ellis,et al.  Slips of the Pen. , 1979 .

[12]  Yves Schabes,et al.  Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction , 1996, ACL.

[13]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[14]  Jennifer Pedler,et al.  Computer spellcheckers and dyslexics - a performance survey , 2001, Br. J. Educ. Technol..

[15]  James H. Martin,et al.  Contextual Spelling Correction Using Latent Semantic Analysis , 1997, ANLP.

[16]  Davide Fossati,et al.  I saw TREE trees in the park: How to Correct Real-Word Spelling Mistakes , 2008, LREC.

[17]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .