Using query log and social tagging to refine queries based on latent topics

An important way to improve users' satisfaction in Web search is to assist them to issue more effective queries. One such approach is query refinement (reformulation), which generates new queries according to the current query issued by users. A common procedure for conducting refinement is to generate some candidate queries first, and then a scoring method is designed to assess the quality of these candidates. Currently, most of the existing methods are context based. They rely heavily on the context relation of terms in the historical queries, and cannot detect and maintain the semantic consistency of queries. In this paper, we propose a graphical model to score queries. The proposed model exploits a latent topic space, which is automatically derived from the query log, to assess the semantic dependency of terms in a query. In the graphical model, both term context dependency and topic context dependency are considered. This also makes it feasible to score some queries which do not have much available historical term context information. We also utilize social tagging data in the candidate query generation process. Based on the observation that different users may tag the same resource with different tags of similar meaning, we propose a method to mine these term pairs for new candidate query construction.

[1]  Fuchun Peng,et al.  Analyzing web text association to disambiguate abbreviation in queries , 2008, SIGIR '08.

[2]  ChengXiang Zhai,et al.  Mining term association patterns from search logs for effective query reformulation , 2008, CIKM '08.

[3]  Yi Cai,et al.  Personalized search by tag-based user profile and resource profile in collaborative tagging systems , 2010, CIKM.

[4]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[6]  Hongbo Deng,et al.  Entropy-biased models for query representation on the click graph , 2009, SIGIR.

[7]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[8]  Ryen W. White,et al.  Studying the use of popular destinations to enhance web search interaction , 2007, SIGIR.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[11]  Doug Downey,et al.  Understanding the relationship between searchers' queries and information goals , 2008, CIKM '08.

[12]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[13]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[14]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[15]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[16]  Iadh Ounis,et al.  Combining fields for query expansion and adaptive query expansion , 2007, Inf. Process. Manag..

[17]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[18]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[19]  Jianfeng Gao,et al.  Clickthrough-based translation models for web search: from word models to phrase models , 2010, CIKM.

[20]  W. Bruce Croft,et al.  Improving verbose queries using subset distribution , 2010, CIKM.

[21]  W. Bruce Croft,et al.  Query reformulation using anchor text , 2010, WSDM '10.

[22]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[23]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[24]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[25]  C. Bauckhage,et al.  Analyzing Social Bookmarking Systems : A del . icio . us Cookbook , 2008 .

[26]  Xin Li,et al.  Context sensitive stemming for web search , 2007, SIGIR.

[27]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[28]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[29]  Xueqi Cheng,et al.  A structured approach to query recommendation with social annotation data , 2010, CIKM.

[30]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[31]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[32]  Hang Li,et al.  A unified and discriminative model for query refinement , 2008, SIGIR '08.

[33]  Reiner Kraft,et al.  Mining anchor text for query refinement , 2004, WWW '04.

[34]  James Allan,et al.  Effective and efficient user interaction for long queries , 2008, SIGIR '08.

[35]  Hugh E. Williams,et al.  Query expansion using associated queries , 2003, CIKM '03.

[36]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.