Correcting Spelling Errors by Modelling Their Causes

This paper accounts for a new technique of correcting isolated words in typed texts. A language-dependent set of string substitutions reflects the surface form of errors that result from vocabulary incompetence, misspellings, or mistypings. Candidate corrections are formed by applying the substitutions to text words absent from the computer lexicon. A minimal acyclic deterministic finite automaton storing the lexicon allows quick rejection of nonsense corrections, while costs associated with the substitutions serve to rank the remaining ones. A comparison of the correction lists generated by several spellcheckers for two corpora of English spelling errors shows that our technique suggests the right words more accurately than the others.

[1]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[2]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[3]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[4]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[5]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[6]  Robert A. Wagner,et al.  Order-n correction for regular languages , 1974, CACM.

[7]  Kurt Maly Compressed tries , 1976, CACM.

[8]  Emmanuel J. Yannakoudakis,et al.  The rules of spelling errors , 1983, Inf. Process. Manag..

[9]  Antonio Zamora,et al.  Collection and characterization of spelling errors in scientific and scholarly text , 1983, J. Am. Soc. Inf. Sci..

[10]  Emmanuel J. Yannakoudakis,et al.  An intelligent spelling error corrector , 1983, Inf. Process. Manag..

[11]  G. Nathan,et al.  The phonology of modern English , 1983 .

[12]  Antonio Zamora,et al.  Automatic spelling correction in scientific and scholarly text , 1984, CACM.

[13]  James L. Peterson,et al.  A note on undetected typing errors , 1986, CACM.

[14]  Roger Mitton,et al.  Spelling checkers, spelling correctors and the misspellings of poor spellers , 1987, Inf. Process. Manag..

[15]  Koenraad De Smedt,et al.  Triphone Analysis: A Combined Method For The Correction Of Orthographical And Typographical Errors , 1988, ANLP.

[16]  Fred J. Damerau,et al.  An examination of undetected typing errors , 1989, Inf. Process. Manag..

[17]  T. N. Gadd,et al.  PHOENIX: the algorithm , 1990 .

[18]  Fred J. Damerau Evaluating computer-generated domain-oriented vocabularies , 1990, Inf. Process. Manag..

[19]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[20]  Kenneth Ward Church,et al.  Probability scoring for spelling correction , 1991 .

[21]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[22]  Ian H. Witten,et al.  Bonsai: A compact representation of trees , 1993, Softw. Pract. Exp..

[23]  Kemal Oflazer,et al.  Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.

[24]  OflazerKemal Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction , 1996 .

[25]  George Havas,et al.  Perfect Hashing , 1997, Theor. Comput. Sci..

[26]  Ricardo A. Baeza-Yates,et al.  Fast approximate string matching in a dictionary , 1998, Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No.98EX207).

[27]  Bruce W. Watson,et al.  Incremental construction of minimal acyclic finite state automata , 2000, CL.

[28]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[29]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[30]  Agata Savary Typographical Nearest-Neighbor Search in a Finite-State Lexicon and Its Application to Spelling Correction , 2001, CIAA.

[31]  Sebastian Deorowicz,et al.  How to squeeze a lexicon , 2001, Softw. Pract. Exp..

[32]  Kristina Toutanova,et al.  Pronunciation Modeling for Improved Spelling Correction , 2002, ACL.

[33]  Mikel L. Forcada,et al.  Incremental Construction and Maintenance of Minimal Finite-State Automata , 2002, CL.

[34]  Klaus U. Schulz,et al.  Fast string correction with Levenshtein automata , 2002, International Journal on Document Analysis and Recognition.

[35]  Victoria J. Hodge,et al.  A Comparison of Standard Spell Checking Algorithms and a Novel Binary Neural Approach , 2003, IEEE Trans. Knowl. Data Eng..

[36]  Klaus U. Schulz,et al.  Fast Approximate Search in Large Dictionaries , 2004, CL.