Memory-based context-sensitive spelling correction at web scale

We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database of token n-gram occurrences in web text as training data. Our approach uses the context in which an error appears to select the most likely candidate from words which might have been intended in its place. Using a novel correction algorithm and a massive database of training data, we demonstrate higher accuracy on correcting real- word errors than previous work, and very high accuracy at a new task of ranking corrections to non-word errors given by a standard spelling correction package.