Evaluating the effectiveness of search task trails

In this paper, we introduce "task trail" as a new concept to understand user search behaviors. We define task to be an atomic user information need. Web search logs have been studied mainly at session or query level where users may submit several queries within one task and handle several tasks within one session. Although previous studies have addressed the problem of task identification, little is known about the advantage of using task over session and query for search applications. In this paper, we conduct extensive analyses and comparisons to evaluate the effectiveness of task trails in three search applications: determining user satisfaction, predicting user search interests, and query suggestion. Experiments are conducted on large scale datasets from a commercial search engine. Experimental results show that: (1) Sessions and queries are not as precise as tasks in determining user satisfaction. (2) Task trails provide higher web page utilities to users than other sources. (3) Tasks represent atomic user information needs, and therefore can preserve topic similarity between query pairs. (4) Task-based query suggestion can provide complementary results to other models. The findings in this paper verify the need to extract task trails from web search logs and suggest potential applications in search and recommendation systems.

[1]  Ahmed Hassan Awadallah,et al.  Beyond DCG: user behavior as a predictor of a successful search , 2010, WSDM '10.

[2]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[3]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[4]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[5]  Qiang Yang,et al.  Q2C@UST: our winning solution to query classification in KDDCUP 2005 , 2005, SKDD.

[6]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[7]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[8]  Ryen W. White,et al.  Predicting short-term interests using activity-based search context , 2010, CIKM.

[9]  Enhong Chen,et al.  Context-aware ranking in web search , 2010, SIGIR '10.

[10]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[11]  Ryen W. White,et al.  Studying the use of popular destinations to enhance web search interaction , 2007, SIGIR.

[12]  Yang Song,et al.  A Task Level User Satisfaction Metric and its Application on Improving Relevance Estimation , 2011 .

[13]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[14]  Wei Yuan,et al.  Smoothing clickthrough data for web search ranking , 2009, SIGIR.

[15]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Francesco Bonchi,et al.  Do you want to take notes?: identifying research missions in Yahoo! search pad , 2010, WWW '10.

[18]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[20]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[21]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[22]  Yang Song,et al.  A task level metric for measuring web search satisfaction and its application on improving relevance estimation , 2011, CIKM '11.

[23]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[24]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[25]  Tie-Yan Liu,et al.  BrowseRank: letting web users vote for page importance , 2008, SIGIR '08.

[26]  Hongbo Deng,et al.  Entropy-biased models for query representation on the click graph , 2009, SIGIR.

[27]  Ryen W. White,et al.  Assessing the scenic route: measuring the value of search trails in web logs , 2010, SIGIR.

[28]  Susan T. Dumais,et al.  Classification-enhanced ranking , 2010, WWW '10.

[29]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[30]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[31]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[32]  Amanda Spink,et al.  How to Define Searching Sessions on Web Search Engines , 2006, WEBKDD.

[33]  Enhong Chen,et al.  Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion , 2011, TIST.

[34]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[35]  Ryen W. White,et al.  Modeling and analysis of cross-session search tasks , 2011, SIGIR.