Learning to extract cross-session search tasks

Search tasks, comprising a series of search queries serving the same information need, have recently been recognized as an accurate atomic unit for modeling user search intent. Most prior research in this area has focused on short-term search tasks within a single search session, and heavily depend on human annotations for supervised classification model learning. In this work, we target the identification of long-term, or cross-session, search tasks (transcending session boundaries) by investigating inter-query dependencies learned from users' searching behaviors. A semi-supervised clustering model is proposed based on the latent structural SVM framework, and a set of effective automatic annotation rules are proposed as weak supervision to release the burden of manual annotation. Experimental results based on a large-scale search log collected from Bing.com confirms the effectiveness of the proposed model in identifying cross-session search tasks and the utility of the introduced weak supervision signals. Our learned model enables a more comprehensive understanding of users' search behaviors via search logs and facilitates the development of dedicated search-engine support for long-term tasks.

[1]  Marilyn Bohl,et al.  Information processing , 1971 .

[2]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[3]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[7]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[8]  William W. Cohen,et al.  Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.

[9]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[10]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[11]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[12]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[13]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[14]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[15]  Amanda Spink,et al.  Multitasking during Web search sessions , 2006, Inf. Process. Manag..

[16]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[17]  Ryen W. White,et al.  WWW 2007 / Track: Browsers and User Interfaces Session: Personalization Investigating Behavioral Variability in Web Search , 2022 .

[18]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[19]  Hujun Bao,et al.  Understanding the Power of Clause Learning , 2009, IJCAI.

[20]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[21]  Ming-Wei Chang,et al.  Structured Output Learning with Indirect Supervision , 2010, ICML.

[22]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[23]  Ryen W. White,et al.  Modeling and analysis of cross-session search tasks , 2011, SIGIR.

[24]  Ryen W. White,et al.  Search, interrupted: understanding and predicting search task continuation , 2012, SIGIR '12.

[25]  Yang Song,et al.  Evaluating the effectiveness of search task trails , 2012, WWW.