Improving Relevance Prediction by Addressing Biases and Sparsity in Web Search Click Data

In this paper, we present our approach and findings in participating the 2012 Yandex Relevance Prediction Challenge. Our approach has two goals: on one hand, we aim to address four types of biases, namely, position-bias, perception-bias, query-bias, and session-bias to better interpret the clickthrough information; on the other hand, we aim to address the clickthrough sparsity by exploiting various back-off strategies. We use gradient boosted regression trees to combine the different features and model the interactions among them. Our final submission ranks 3rd (AUC 0.6635) among the prize eligible participants on the first subset of test queries, but drops to 8th (AUC 0.6536) on the second (hidden) subset, which is potentially due to over-fitting. In this paper, we also discuss our post-competition efforts in addressing this issue through crossvalidation and more careful model selection.

[1]  Eugene Agichtein,et al.  Find it if you can: a game for modeling different types of web search success using interaction data , 2011, SIGIR.

[2]  Yuchen Zhang,et al.  Incorporating post-click behaviors into a click model , 2010, SIGIR.

[3]  Wei Yuan,et al.  Smoothing clickthrough data for web search ranking , 2009, SIGIR.

[4]  Chao Liu,et al.  Click chain model in web search , 2009, WWW '09.

[5]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[6]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[7]  Yuchen Zhang,et al.  Characterizing search intent diversity into click models , 2011, WWW.

[8]  Szymon Jaroszewicz,et al.  Efficient AUC Optimization for Classification , 2007, PKDD.

[9]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[10]  Anne Aula,et al.  How does search behavior change as search becomes more difficult? , 2010, CHI.

[11]  Yuchen Zhang,et al.  User-click modeling for understanding and predicting search-behavior , 2011, KDD.

[12]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[13]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[14]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[15]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[16]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[17]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[18]  Ryen W. White,et al.  Predicting query performance using query, result, and user interaction features , 2010, RIAO.

[19]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[20]  Ciya Liao,et al.  A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine , 2010, WSDM '10.