A unified and discriminative model for query refinement

This paper addresses the issue of query refinement, which involves reformulating ill-formed search queries in order to enhance relevance of search results. Query refinement typically includes a number of tasks such as spelling error correction, word splitting, word merging, phrase segmentation, word stemming, and acronym expansion. In previous research, such tasks were addressed separately or through employing generative models. This paper proposes employing a unified and discriminative model for query refinement. Specifically, it proposes a Conditional Random Field (CRF) model suitable for the problem, referred to as Conditional Random Field for Query Refinement (CRF-QR). Given a sequence of query words, CRF-QR predicts a sequence of refined query words as well as corresponding refinement operations. In that sense, CRF-QR differs greatly from conventional CRF models. Two types of CRF-QR models, namely a basic model and an extended model are introduced. One merit of employing CRF-QR is that different refinement tasks can be performed simultaneously and thus the accuracy of refinement can be enhanced. Furthermore, the advantages of discriminative models over generative models can be fully leveraged. Experimental results demonstrate that CRF-QR can significantly outperform baseline methods. Furthermore, when CRF-QR is used in web search, a significant improvement of relevance can be obtained.

[1]  Reiner Kraft,et al.  Mining anchor text for query refinement , 2004, WWW '04.

[2]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[3]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[4]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[7]  Peter Boros,et al.  Query Segmentation for Web Search , 2003, WWW.

[8]  Javed A. Aslam,et al.  Evaluation of phrasal query suggestions , 2007, CIKM '07.

[9]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[10]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[11]  Xin Li,et al.  Context sensitive stemming for web search , 2007, SIGIR.

[12]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[13]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[14]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[15]  Ron Weiss,et al.  Fast and effective query refinement , 1997, SIGIR '97.

[16]  Farooq Ahmad,et al.  Learning a Spelling Error Model from Search Query Logs , 2005, HLT.

[17]  Qin Iris Wang,et al.  Learning Noun Phrase Query Segmentation , 2007, EMNLP.

[18]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[19]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[20]  Rosie Jones,et al.  Query word deletion prediction , 2003, SIGIR.

[21]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[22]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[23]  Ming Zhou,et al.  Improving Query Spelling Correction Using Web Search Results , 2007, EMNLP-CoNLL.

[24]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[25]  Yang Zhang,et al.  Exploring Distributional Similarity Based Models for Query Spelling Correction , 2006, ACL.

[26]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[27]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[28]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[29]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..