Recently, there has been considerable interest in the use of historical logged user interaction data—queries and clicks—for evaluation of search systems in the context of counterfactual analysis [8,10]. Recent approaches attempt to de-bias the historical log data by conducting randomization experiments and modeling the bias in user behavior. Thus far, the focus has been on addressing bias that arises due to the position of the document being clicked (position-bias) or sparsity of clicks on certain query-document pairs (selection-bias). However, there is another source of bias that could arise: the bias due to the context in which a document was presented to the user. The propensity of the user clicking on a document depends not only on its position but also on many other contextual factors. In this work, we show that the existing counterfactual estimators fail to capture one type of bias, specifically, the effect on click-through rates due to the relevance of documents ranked above. Further, we propose a modification to the existing estimator that takes into account this bias. We rely on full result randomization that allows us to control for the click context at various ranks; we demonstrate the effectiveness of our methods in evaluating retrieval system through experiments on a simulation setup that is designed to cover a wide variety of scenarios.
[1]
Olivier Chapelle,et al.
Expected reciprocal rank for graded relevance
,
2009,
CIKM.
[2]
Thorsten Joachims,et al.
Unbiased Learning-to-Rank with Biased Feedback
,
2016,
WSDM.
[3]
Benjamin Piwowarski,et al.
A user browsing model to predict search engine click data from past observations.
,
2008,
SIGIR '08.
[4]
Nick Craswell,et al.
An experimental comparison of click position-bias models
,
2008,
WSDM '08.
[5]
Yisong Yue,et al.
Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data
,
2010,
WWW '10.
[6]
Marc Najork,et al.
Position Bias Estimation for Unbiased Learning to Rank in Personal Search
,
2018,
WSDM.
[7]
M. de Rijke,et al.
Click Models for Web Search
,
2015,
Click Models for Web Search.
[8]
Filip Radlinski,et al.
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
,
2007,
TOIS.
[9]
Ben Carterette,et al.
Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback
,
2018,
SIGIR.
[10]
Marc Najork,et al.
Learning to Rank with Selection Bias in Personal Search
,
2016,
SIGIR.
[11]
Olivier Chapelle,et al.
A dynamic bayesian network click model for web search ranking
,
2009,
WWW '09.