Identifying and labeling search tasks via query-based hawkes processes

We consider a search task as a set of queries that serve the same user information need. Analyzing search tasks from user query streams plays an important role in building a set of modern tools to improve search engine performance. In this paper, we propose a probabilistic method for identifying and labeling search tasks based on the following intuitive observations: queries that are issued temporally close by users in many sequences of queries are likely to belong to the same search task, meanwhile, different users having the same information needs tend to submit topically coherent search queries. To capture the above intuitions, we directly model query temporal patterns using a special class of point processes called Hawkes processes, and combine topic models with Hawkes processes for simultaneously identifying and labeling search tasks. Essentially, Hawkes processes utilize their self-exciting properties to identify search tasks if influence exists among a sequence of queries for individual users, while the topic model exploits query co-occurrence across different users to discover the latent information needed for labeling search tasks. More importantly, there is mutual reinforcement between Hawkes processes and the topic model in the unified model that enhances the performance of both. We evaluate our method based on both synthetic data and real-world query log data. In addition, we also apply our model to query clustering and search task identification. By comparing with state-of-the-art methods, the results demonstrate that the improvement in our proposed approach is consistent and promising.

[1]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Yosihiko Ogata,et al.  Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes , 1988 .

[4]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[5]  D. Vere-Jones,et al.  Stochastic Declustering of Space-Time Earthquake Occurrences , 2002 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Amanda Spink,et al.  Multitasking Web search on Vivisimo.com , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[8]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[9]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[10]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[11]  W. Eric L. Grimson,et al.  Spatial Latent Dirichlet Allocation , 2007, NIPS.

[12]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[13]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[14]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[15]  Hongbo Deng,et al.  Entropy-biased models for query representation on the click graph , 2009, SIGIR.

[16]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[17]  Yacine Ait-Sahalia,et al.  Modeling Financial Contagion Using Mutually Exciting Jump Processes , 2010 .

[18]  Kay Giesecke,et al.  Affine Point Processes and Portfolio Credit Risk , 2010, SIAM J. Financial Math..

[19]  Erik A. Lewis,et al.  RESEARCH ARTICLE A Nonparametric EM algorithm for Multiscale Hawkes Processes , 2011 .

[20]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[21]  Filippo Menczer,et al.  Behavior-driven clustering of queries into topics , 2011, CIKM '11.

[22]  Ryen W. White,et al.  Modeling and analysis of cross-session search tasks , 2011, SIGIR.

[23]  F. Schoenberg Introduction to Point Processes , 2011 .

[24]  A. Stomakhin,et al.  Reconstruction of missing data in social networks based on temporal patterns of interactions , 2011 .

[25]  Ryen W. White,et al.  Search, interrupted: understanding and predicting search task continuation , 2012, SIGIR '12.

[26]  Yang Song,et al.  Evaluating the effectiveness of search task trails , 2012, WWW.

[27]  Katherine A. Heller,et al.  Modelling Reciprocating Relationships with Hawkes Processes , 2012, NIPS.

[28]  Visakan Kadirkamanathan,et al.  Point process modelling of the Afghan War Diary , 2012, Proceedings of the National Academy of Sciences.

[29]  Gentry White,et al.  Self-exciting hurdle models for terrorist activity , 2012, 1203.3680.

[30]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[31]  Haixun Wang,et al.  Identifying users' topical tasks in web search , 2013, WSDM.

[32]  Hongyuan Zha,et al.  Dyadic event attribution in social networks with mixtures of hawkes processes , 2013, CIKM.

[33]  Fabrizio Silvestri,et al.  Discovering tasks from search engine query logs , 2013, TOIS.

[34]  Wei Chu,et al.  Learning to extract cross-session search tasks , 2013, WWW.

[35]  Shuang-Hong Yang,et al.  Mixture of Mutually Exciting Processes for Viral Diffusion , 2013, ICML.

[36]  Wei Chu,et al.  Enhancing personalized search by mining and modeling task behavior , 2013, WWW.

[37]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.