Correct your text with Google

With the increasing amount of text files that are produced nowadays, spell checkers have become essential tools for everyday tasks of millions of end users. Among the years, several tools have been designed that show decent performances. Of course, grammatical checkers may improve corrections of texts, nevertheless, this requires large resources. We think that basic spell checking may be improved (a step towards) using the Web as a corpus and taking into account the context of words that are identified as potential misspellings. We propose to use the Google search engine and some machine learning techniques, in order to design a flexible and dynamic spell checker that may evolve among the time with new linguistic features.

[1]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[2]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[3]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Dragomir R. Radev,et al.  Building a Generation Knowledge Source using Internet-Accessible Newswire , 1997, ANLP.

[6]  Martha W. Evens,et al.  Spelling Correction using Context , 1998, ACL.

[7]  Philip Resnik,et al.  Mining the Web for Bilingual Text , 1999, ACL.

[8]  Gregory Grefenstette,et al.  Estimation of English and non-English Language Use on the WWW , 2000, RIAO.

[9]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Dragomir R. Radev,et al.  Mining the web for answers to natural language questions , 2001, CIKM '01.

[11]  R. Ghani Using the Web to Create Minority Language Corpora , 2001 .

[12]  Julio Gonzalo,et al.  Automatic Association of Web Directories with Word Senses , 2003, Computational Linguistics.

[13]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[14]  Malvina Nissim,et al.  Using the Web for Nominal Anaphora Resolution , 2003 .

[15]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[16]  Wessel Kraaij,et al.  Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval , 2003, CL.

[17]  Andy Way,et al.  wEBMT: Developing and Validating an Example-Based Machine Translation System using the World Wide Web , 2003, CL.

[18]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[19]  Frank Keller,et al.  The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks , 2004, NAACL.

[20]  Stefan Th. Gries,et al.  What is Corpus Linguistics? , 2009, Lang. Linguistics Compass.