Smoothing Click Counts for Aggregated Vertical Search

Clickthrough data is a critical feature for improving web search ranking. Recently, many search portals have provided aggregated search, which retrieves relevant information from various heterogeneous collections called verticals. In addition to the well-known problem of rank bias, clickthrough data recorded in the aggregated search environment suffers from severe sparseness problems due to the limited number of results presented for each vertical. This skew in clickthrough data, which we call rank cut, makes optimization of vertical searches more difficult. In this work, we focus on mitigating the negative effect of rank cut for aggregated vertical searches. We introduce a technique for smoothing click counts based on spectral graph analysis. Using real clickthrough data from a vertical recorded in an aggregated search environment, we show empirically that clickthrough data smoothed by this technique is effective for improving the vertical search

[1]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[2]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[3]  Mounia Lalmas,et al.  Workshop on aggregated search , 2008, SIGF.

[4]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[5]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[6]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[7]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[8]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[9]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[11]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[12]  Wei Yuan,et al.  Smoothing clickthrough data for web search ranking , 2009, SIGIR.

[13]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[14]  W. D. Cairns THE MATHEMATICAL ASSOCIATION OF AMERICA. , 1917, Science.

[15]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[16]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[17]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[18]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[19]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[20]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[21]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[22]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[23]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[24]  Forman S. Acton,et al.  Numerical methods that work , 1970 .

[25]  Fernando Diaz,et al.  Regularizing ad hoc retrieval scores , 2005, CIKM '05.