Efficient query recommendations in the long tail via center-piece subgraphs

We present a recommendation method based on the well-known concept of center-piece subgraph, that allows for the time/space efficient generation of suggestions also for rare, i.e., long-tail queries. Our method is scalable with respect to both the size of datasets from which the model is computed and the heavy workloads that current web search engines have to deal with. Basically, we relate terms contained into queries with highly correlated queries in a query-flow graph. This enables a novel recommendation generation method able to produce recommendations for approximately 99% of the workload of a real-world search engine. The method is based on a graph having term nodes, query nodes, and two kinds of connections: term-query and query-query. The first connects a term to the queries in which it is contained, the second connects two query nodes if the likelihood that a user submits the second query after having issued the first one is sufficiently high. On such large graph we need to compute the center-piece subgraph induced by terms contained into queries. In order to reduce the cost of the above computation, we introduce a novel and efficient method based on an inverted index representation of the model. We experiment our solution on two real-world query logs and we show that its effectiveness is comparable (and in some case better) than state-of-the-art methods for head-queries. More importantly, the quality of the recommendations generated remains very high also for long-tail queries, where other methods fail even to produce any suggestion. Finally, we extensively investigate scalability and efficiency issues and we show the viability of our method in real world search engines.

[1]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[2]  Doug Downey,et al.  Heads and tails: studies of web search with common and rare queries , 2007, SIGIR.

[3]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[4]  Aristides Gionis,et al.  Design trade-offs for search engine caching , 2008, TWEB.

[5]  Francesco Bonchi,et al.  From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[6]  Fabrizio Silvestri,et al.  VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming , 2010, CIKM.

[7]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[8]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[9]  Umut Ozertem,et al.  Synthesizing high utility suggestions for rare web search queries , 2011, SIGIR '11.

[10]  Aristides Gionis,et al.  Improving recommendation for long-tail queries via templates , 2011, WWW.

[11]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[12]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[13]  Jingfang Xu,et al.  Learning similarity function for rare queries , 2011, WSDM '11.

[14]  Francesco Bonchi,et al.  Query reformulation mining: models, patterns, and applications , 2011, Information Retrieval.

[15]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[16]  Yang Song,et al.  Optimal rare query suggestion with implicit user feedback , 2010, WWW '10.

[17]  Andrei Z. Broder,et al.  Online expansion of rare queries for sponsored search , 2009, WWW '09.