Mining broad latent query aspects from search sessions

Search queries are typically very short, which means they are often underspecified or have senses that the user did not think of. A broad latent query aspect is a set of keywords that succinctly represents one particular sense, or one particular information need, that can aid users in reformulating such queries. We extract such broad latent aspects from query reformulations found in historical search session logs. We propose a framework under which the problem of extracting such broad latent aspects reduces to that of optimizing a formal objective function under constraints on the total number of aspects the system can store, and the number of aspects that can be shown in response to any given query. We present algorithms to find a good set of aspects, and also to pick the best k aspects matching any query. Empirical results on real-world search engine logs show significant gains over a strong baseline that uses single-keyword reformulations: a gain of 14% and 23% in terms of human-judged accuracy and click-through data respectively, and around 20% in terms of consistency among aspects predicted for "similar" queries. This demonstrates both the importance of broad query aspects, and the efficacy of our algorithms for extracting them.

[1]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[2]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[3]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[4]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[5]  Coniferous softwood GENERAL TERMS , 2003 .

[6]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[7]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[8]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[9]  Amanda Spink,et al.  Model for organizational knowledge creation and strategic use of information: Research Articles , 2005 .

[10]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[11]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[12]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[13]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[14]  Daniela Rus,et al.  Journal of Graph Algorithms and Applications the Star Clustering Algorithm for Static and Dynamic Information Organization , 2022 .

[15]  Hugh E. Williams,et al.  Query expansion using associated queries , 2003, CIKM '03.

[16]  Ariel Fuxman,et al.  Using the wisdom of the crowds for keyword generation , 2008, WWW.

[17]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[18]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[19]  Benjamin Van Durme,et al.  Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs , 2008, ACL.

[20]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[21]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.