Generalized syntactic and semantic models of query reformulation

We present a novel approach to query reformulation which combines syntactic and semantic information by means of generalized Levenshtein distance algorithms where the substitution operation costs are based on probabilistic term rewrite functions. We investigate unsupervised, compact and efficient models, and provide empirical evidence of their effectiveness. We further explore a generative model of query reformulation and supervised combination methods providing improved performance at variable computational costs. Among other desirable properties, our similarity measures incorporate information-theoretic interpretations of taxonomic relations such as specification and generalization.

[1]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[2]  Marti A. Hearst Search User Interfaces , 2009 .

[3]  Eric Horvitz,et al.  Patterns of search: analyzing and modeling Web query refinement , 1999 .

[4]  Soo Young Rieh,et al.  Analysis of multiple query reformulations on the web: The interactive information retrieval context , 2006, Information Processing & Management.

[5]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[6]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[7]  Larry Fitzpatrick,et al.  Automatic feedback using past queries: social searching? , 1997, SIGIR '97.

[8]  B. John Oommen,et al.  A formal theory for optimal and information theoretic syntactic pattern recognition , 1998, Pattern Recognit..

[9]  Yi Liu,et al.  Translating Queries into Snippets for Improved Query Expansion , 2008, COLING.

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[12]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[13]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[14]  Philip Resnik,et al.  OCR error correction using a noisy channel model , 2002 .

[15]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[16]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[17]  Enrique Alfonseca,et al.  Large-scale Computation of Distributional Similarities for Queries , 2009, HLT-NAACL.

[18]  James Allan,et al.  Relevance feedback with too much data , 1995, SIGIR '95.

[19]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[20]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[21]  Gabriel Recchia,et al.  More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis , 2009, Behavior research methods.

[22]  Francesco Bonchi,et al.  From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[23]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[24]  Amanda Spink,et al.  Web searcher interaction with the Dogpile.com metasearch engine , 2007 .

[25]  B. John Oommen,et al.  On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification , 2009, PRIB.

[26]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[27]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[28]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[29]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[30]  Stefan Riezler,et al.  Learning Dense Models of Query Similarity from User Click Logs , 2010, NAACL.

[31]  Victor Sadler,et al.  Book Reviews: Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon , 1993, CL.