Spelling Correction for Search Engine Queries

Search engines have become the primary means of accessing information on the Web. However, recent studies show misspelled words are very common in queries to these systems. When users misspell query, the results are incorrect or provide inconclusive information. In this work, we discuss the integration of a spelling correction component into tumba!, our community Web search engine. We present an algorithm that attempts to select the best choice among all possible corrections for a misspelled term, and discuss its implementation based on a ternary search tree data structure.

[1]  Victoria J. Hodge,et al.  A Novel Binary Spell Checker , 2001, ICANN.

[2]  Martha W. Evens,et al.  Spelling Correction using Context , 1998, ACL.

[3]  Allen R. Hanson,et al.  A Contextual Postprocessing System for Error Correction Using Binary n-Grams , 1974, IEEE Transactions on Computers.

[4]  Andreia Gentil Bonfante Uso de Redes Neurais para Correção Gramatical do Português: Um Estudo de Caso , 1997 .

[5]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6]  Elena M. Zamora,et al.  The use of trigram analysis for spelling error detection , 1981, Inf. Process. Manag..

[7]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[8]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[9]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[10]  David Alex Lamb,et al.  Spelling correction in user interfaces , 1983, CACM.

[11]  Philippe Flajolet,et al.  The analysis of hybrid trie structures , 1998, SODA '98.

[12]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[13]  B. John Oommen,et al.  Spelling correction using probabilistic methods , 1984, Pattern Recognit. Lett..

[14]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[15]  William M. Fisher A statistical text-to-phone function using ngrams and rules , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16]  Diana Santos,et al.  Evaluating CETEMPúblico, a Free Resource for Portuguese , 2001, ACL.

[17]  Johnny Bigert Probabilistic Detection of Context-Sensitive Spelling Errors , 2004, LREC.

[18]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[19]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[20]  Mário J. Silva,et al.  The Case for a Portuguese Web Search Engine , 2003, ICWI.

[21]  Kristina Toutanova,et al.  Pronunciation Modeling for Improved Spelling Correction , 2002, ACL.

[22]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[23]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[24]  Leon Davidson,et al.  Retrieval of misspelled names in an airlines passenger record system , 1962, CACM.

[25]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[26]  Hercules Dalianis,et al.  Evaluating a Spelling Support in a Search Engine , 2002, NLDB.

[27]  Luís Sarmento,et al.  O projecto AC/DC: acesso a corpora/disponibilização de corpora , 2003 .

[28]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[29]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[30]  Kurt Hornik,et al.  Artificial Neural Networks — ICANN 2001 , 2001, Lecture Notes in Computer Science.

[31]  Justin Zobel,et al.  Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.

[32]  Victoria J. Hodge,et al.  An Evaluation of Phonetic Spell Checkers , 2001 .

[33]  J. Bentley,et al.  TERNARY SEARCH TREES , 1998 .