Aleph or Aleph-Maddah, that is the question! Spelling correction for search engine autocomplete service

In this paper, we proposed a combinational framework, called Aleph checker (similar to spell checker), to select the correct form of Aleph in Perso-Arabic words. This system tries to check spelling of Persian words by using user queries of a web search engine. For each word, a Soundex algorithm extracts a list of candidates, which are suspicious to be misspelled of the word. An Ngram based filtering, prunes the list of equivalences and the remained items are counted to generate the frequency of different spellings of the word. At the last step, correct spelling is determined by a linear binary classifier. In conducted experiment, the system shows the accuracy of 0.92 in determining correct form of Aleph.

[1]  Hao Hu,et al.  Diversifying Query Suggestions by Using Topics from Wikipedia , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[2]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[3]  Rob C. van Ommering,et al.  Algorithmic and user study of an autocompletion algorithm on a large medical vocabulary , 2012, J. Biomed. Informatics.

[4]  Michael Gertz,et al.  CONQUER: a system for efficient context-aware query suggestions , 2011, WWW.

[5]  Rodrygo L. T. Santos Explicit web search result diversification , 2013, SIGF.

[6]  Di Jiang,et al.  Personalized Query Suggestion With Diversity Awareness , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[7]  Young-Gab Kim,et al.  Web robot detection based on pattern-matching technique , 2012, J. Inf. Sci..

[8]  Wei Chu,et al.  Enhancing personalized search by mining and modeling task behavior , 2013, WWW.

[9]  Pablo Castells,et al.  Personalized diversification of search results , 2012, SIGIR '12.

[10]  Idan Szpektor,et al.  From query to question in one click: suggesting synthetic questions to searchers , 2013, WWW.

[11]  Craig MacDonald,et al.  How diverse are web search results? , 2011, SIGIR '11.

[12]  Dror G. Feitelson,et al.  Distinguishing humans from robots in web search logs: preliminary results using query rates and intervals , 2009, WSCD '09.

[13]  Jon M. Kleinberg,et al.  Spatial variation in search engine queries , 2008, WWW.

[14]  Marcin Sydow,et al.  Introducing Diversity to Log-Based Query Suggestions to Deal with Underspecified User Queries , 2011, SIIS.

[15]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[16]  Enhong Chen,et al.  A vlHMM approach to context-aware search , 2013, TWEB.

[17]  Tetsuya Sakai,et al.  Structured query suggestion for specialization and parallel movement: effect on search behaviors , 2012, WWW.

[18]  Huizhong Duan,et al.  Online spelling correction for query completion , 2011, WWW.

[19]  Ming Zhou,et al.  Improving Query Spelling Correction Using Web Search Results , 2007, EMNLP-CoNLL.

[20]  Ingmar Weber,et al.  The demographics of web search , 2010, SIGIR.

[21]  ChengXiang Zhai,et al.  A generalized hidden Markov model with discriminative training for query spelling correction , 2012, SIGIR '12.

[22]  Craig MacDonald,et al.  Intent models for contextualising and diversifying query suggestions , 2013, CIKM.

[23]  Luca Becchetti,et al.  An optimization framework for query recommendation , 2010, WSDM '10.

[24]  Craig MacDonald,et al.  Learning to rank query suggestions for adhoc and diversity search , 2012, Information Retrieval.

[25]  Xu Sun,et al.  Fast multi-task learning for query spelling correction , 2012, CIKM '12.

[26]  Pablo E. Román,et al.  Identifying user sessions from web server logs with integer programming , 2014, Intell. Data Anal..