Estimating Clickthrough Bias in the Cascade Model

Recently, there has been considerable interest in the use of historical logged user interaction data—queries and clicks—for evaluation of search systems in the context of counterfactual analysis [8,10]. Recent approaches attempt to de-bias the historical log data by conducting randomization experiments and modeling the bias in user behavior. Thus far, the focus has been on addressing bias that arises due to the position of the document being clicked (position-bias) or sparsity of clicks on certain query-document pairs (selection-bias). However, there is another source of bias that could arise: the bias due to the context in which a document was presented to the user. The propensity of the user clicking on a document depends not only on its position but also on many other contextual factors. In this work, we show that the existing counterfactual estimators fail to capture one type of bias, specifically, the effect on click-through rates due to the relevance of documents ranked above. Further, we propose a modification to the existing estimator that takes into account this bias. We rely on full result randomization that allows us to control for the click context at various ranks; we demonstrate the effectiveness of our methods in evaluating retrieval system through experiments on a simulation setup that is designed to cover a wide variety of scenarios.