Applying Winnow to Context-Sensitive Spelling Correction

Multiplicative weight-updating algorithms such as Winnow have been studied extensively in the COLT literature, but only recently have people started to use them in applications. In this paper, we apply a Winnow-based algorithm to a task in natural language: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting {\it to\/} for {\it too}, {\it casual\/} for {\it causal}, and so on. Previous approaches to this problem have been statistics-based; we compare Winnow to one of the more successful such approaches, which uses Bayesian classifiers. We find that: (1)~When the standard (heavily-pruned) set of features is used to describe problem instances, Winnow performs comparably to the Bayesian method; (2)~When the full (unpruned) set of features is used, Winnow is able to exploit the new features and convincingly outperform Bayes; and (3)~When a test set is encountered that is dissimilar to the training set, Winnow is better than Bayes at adapting to the unfamiliar test set, using a strategy we will present for combining learning on the training set with unsupervised learning on the (noisy) test set.

[1]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[2]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[3]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[4]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[5]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[6]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[9]  S. B. Flexner,et al.  Random House unabridged dictionary , 1993 .

[10]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[11]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[12]  Leslie G. Valiant,et al.  Circuits of the mind , 1994 .

[13]  Andrew R. Golding,et al.  A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[14]  Nick Littlestone,et al.  Comparing Several Linear-threshold Learning Algorithms on Tasks Involving Superfluous Attributes , 1995, ICML.

[15]  Dan Roth,et al.  Learning to Reason: The Non-Monotonic Case , 1995, IJCAI.

[16]  Dan Roth A Connectionist Framework for Reasoning: Reasoning with Examples , 1996, AAAI/IAAI, Vol. 2.

[17]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[18]  Dan Roth,et al.  Learning to reason , 1994, JACM.

[19]  A. Blum Learning Boolean Functions in an Infinite Attribute Space , 1992, Machine Learning.