Context-aware query suggestion by mining click-through and session data

Query suggestion plays an important role in improving the usability of search engines. Although some recently proposed methods can make meaningful query suggestions by mining query patterns from search logs, none of them are context-aware - they do not take into account the immediately preceding queries as context in query suggestion. In this paper, we propose a novel context-aware query suggestion approach which is in two steps. In the offine model-learning step, to address data sparseness, queries are summarized into concepts by clustering a click-through bipartite. Then, from session data a concept sequence suffix tree is constructed as the query suggestion model. In the online query suggestion step, a user's search context is captured by mapping the query sequence submitted by the user to a sequence of concepts. By looking up the context in the concept sequence sufix tree, our approach suggests queries to the user in a context-aware manner. We test our approach on a large-scale search log of a commercial search engine containing 1:8 billion search queries, 2:6 billion clicks, and 840 million query sessions. The experimental results clearly show that our approach outperforms two baseline methods in both coverage and quality of suggestions.

[1]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[2]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[3]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[4]  Reiner Kraft,et al.  Mining anchor text for query refinement , 2004, WWW '04.

[5]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[6]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[7]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[8]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  Ryen W. White,et al.  Studying the use of popular destinations to enhance web search interaction , 2007, SIGIR.

[11]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Mark Magennis,et al.  The potential and actual effectiveness of interactive query expansion , 1997, SIGIR '97.

[14]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[15]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[16]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[17]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[18]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[19]  Charles L. A. Clarke,et al.  Scoring missing terms in information retrieval tasks , 2004, CIKM '04.

[20]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[21]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[22]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.