Entropy-biased models for query representation on the click graph

Query log analysis has received substantial attention in recent years, in which the click graph is an important technique for describing the relationship between queries and URLs. State-of-the-art approaches based on the raw click frequencies for modeling the click graph, however, are not noise-eliminated. Nor do they handle heterogeneous query-URL pairs well. In this paper, we investigate and develop a novel entropy-biased framework for modeling click graphs. The intuition behind this model is that various query-URL pairs should be treated differently, i.e., common clicks on less frequent but more specific URLs are of greater value than common clicks on frequent and general URLs. Based on this intuition, we utilize the entropy information of the URLs and introduce a new concept, namely the inverse query frequency (IQF), to weigh the importance (discriminative ability) of a click on a certain URL. The IQF weighting scheme is never explicitly explored or statistically examined for any bipartite graphs in the information retrieval literature. We not only formally define and quantify this scheme, but also incorporate it with the click frequency and user frequency information on the click graph for an effective query representation. To illustrate our methodology, we conduct experiments with the AOL query log data for query similarity analysis and query suggestion tasks. Experimental results demonstrate that considerable improvements in performance are obtained with our entropy-biased models. Moreover, our method can also be applied to other bipartite graphs.

[1]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[2]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[3]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[4]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[5]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[6]  Susan T. Dumais,et al.  To personalize or not to personalize: modeling queries with variation in user intent , 2008, SIGIR '08.

[7]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[8]  Ricardo Baeza-Yates,et al.  Query-sets: using implicit feedback and query patterns to organize web documents , 2008, WWW.

[9]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[10]  Aristides Gionis,et al.  Dr. Searcher and Mr. Browser: a unified hyperlink-click graph , 2008, CIKM '08.

[11]  Arjen P. de Vries,et al.  Relevance information: a loss of entropy but a gain for IDF? , 2005, SIGIR '05.

[12]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[13]  Thomas Roelleke,et al.  TF-IDF uncovered: a study of theories and probabilities , 2008, SIGIR '08.

[14]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[15]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[16]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[17]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[18]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[19]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[20]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[21]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[22]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[23]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[24]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[25]  ChengXiang Zhai,et al.  Learn from web search logs to organize search results , 2007, SIGIR.

[26]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[27]  Aaron D. Wyner,et al.  Prediction and Entropy of Printed English , 1993 .