Clustering query refinements by user intent

We address the problem of clustering the refinements of a user search query. The clusters computed by our proposed algorithm can be used to improve the selection and placement of the query suggestions proposed by a search engine, and can also serve to summarize the different aspects of information relevant to the original user query. Our algorithm clusters refinements based on their likely underlying user intents by combining document click and session co-occurrence information. At its core, our algorithm operates by performing multiple random walks on a Markov graph that approximates user search behavior. A user study performed on top search engine queries shows that our clusters are rated better than corresponding clusters computed using approaches that use only document click or only sessions co-occurrence information.

[1]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[2]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[3]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[4]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[5]  Hugh E. Williams,et al.  Query expansion using associated queries , 2003, CIKM '03.

[6]  Dorit S. Hochbaum,et al.  Polynomial algorithm for the k-cut problem , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[7]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[8]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[9]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[10]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[11]  Deepayan Chakrabarti,et al.  Mining broad latent query aspects from search sessions , 2009, KDD.

[12]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[13]  Kenneth Dixon,et al.  Introduction to Stochastic Modeling , 2011 .

[14]  Coniferous softwood GENERAL TERMS , 2003 .

[15]  Vijay V. Vazirani,et al.  Finding k-cuts within twice the optimal , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[16]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[17]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[18]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[19]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[20]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[21]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[22]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[23]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[24]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Qi He,et al.  Web Query Recommendation via Sequential Query Prediction , 2009, 2009 IEEE 25th International Conference on Data Engineering.