Query session detection as a cascade

We propose a cascading method for query session detection, the problem of identifying series of consecutive queries a user submits with the same information need. While the existing session detection research mostly deals with effectiveness, our focus also is on efficiency, and we investigate questions related to the analysis trade-off: How expensive (in terms of runtime) is a certain improvement in F-Measure? In this regard, we distinguish two major scenarios where query session knowledge is important: (1) In an online setting, the search engine tries to incorporate knowledge of the preceding queries for an improved retrieval performance. Obviously, the efficiency of the session detection method is a crucial issue as the overall retrieval time should not be influenced too much. (2) In an offline post-retrieval setting, search engine logs are divided into sessions in order to examine what causes users to fail or to identify typical reformulation patterns etc. Here, efficiency might not be as important as in the online scenario but the accuracy of the detected sessions is essential. Our cascading method provides a sensible treatment for both scenarios. It involves different steps that form a cascade in the sense that computationally costly and hence time-consuming features are applied only after cheap features "failed." This is different to previous session detection methods, most of which involve many features simultaneously. Experiments on a standard test corpus show the cascading method to save runtime compared to the state of the art while the detected sessions' accuracy is even superior.

[1]  Bernard J. Jansen Limits of the Web Log Analysis Artifacts , 2006 .

[2]  Benno Stein,et al.  The ESA retrieval model revisited , 2009, SIGIR.

[3]  Amanda Spink,et al.  Multitasking during Web search sessions , 2006, Inf. Process. Manag..

[4]  Daniel Gayo-Avello,et al.  A survey on session detection methods in query logs and a proposal for future evaluation , 2009, Inf. Sci..

[5]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[6]  Shui-Lung Chuang,et al.  A practical web-based approach to generating topic hierarchy for text segments , 2004, CIKM '04.

[7]  Amanda Spink,et al.  Defining a session on Web search engines: Research Articles , 2007 .

[8]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[9]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[10]  Gayo-AvelloDaniel A survey on session detection methods in query logs and a proposal for future evaluation , 2009 .

[11]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[12]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[13]  Doug Downey,et al.  Models of Searching and Browsing: Languages, Studies, and Application , 2007, IJCAI.

[14]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[15]  Amanda Spink,et al.  Query Modifications Patterns During Web Searching , 2007, Fourth International Conference on Information Technology (ITNG'07).

[16]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[17]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[18]  Amanda Spink,et al.  Multitasking information seeking and searching processes , 2002, J. Assoc. Inf. Sci. Technol..

[19]  Daqing He,et al.  Detecting session boundaries from Web user logs , 2000 .

[20]  ChengXiang Zhai,et al.  Implicit user modeling for personalized search , 2005, CIKM '05.

[21]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.