论文信息 - Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking - 字舞流文

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences between ranking systems based on historical interaction data, while mitigating the effect of position bias and item-selection bias. We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance. As minimizing variance leads to faster convergence, LogOpt increases the data-efficiency of counterfactual estimation. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods. Furthermore, we perform large-scale experiments by simulating comparisons between thousands of rankers. Our results show that while interleaving methods make systematic errors, LogOpt is as efficient as interleaving without being biased.

M. de Rijke | Maarten de Rijke | Harrie Oosterhuis | Harrie Oosterhuis

[1] W. Bruce Croft,et al. Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[2] Filip Radlinski,et al. Predicting Search Satisfaction Metrics with Interleaved Comparisons , 2015, SIGIR.

[3] Nick Craswell,et al. An experimental comparison of click position-bias models , 2008, WSDM '08.

[4] Cheng Li,et al. The LambdaLoss Framework for Ranking Metric Optimization , 2018, CIKM.

[5] M. de Rijke,et al. Policy-Aware Unbiased Learning to Rank for Top-k Rankings , 2020, SIGIR.

[6] Yi Chang,et al. Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[7] Thorsten Joachims,et al. Intervention Harvesting for Context-Dependent Examination-Bias Estimation , 2018, SIGIR.

[8] Marc Najork,et al. Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[9] Thorsten Joachims,et al. Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[10] Ed H. Chi,et al. Off-policy Learning in Two-stage Recommender Systems , 2020, WWW.

[11] Tao Qin,et al. Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[12] M. de Rijke,et al. An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.

[13] Filip Radlinski,et al. Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[14] Katja Hofmann,et al. A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[15] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[16] Marc Najork,et al. Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[17] Thorsten Joachims,et al. Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[18] Shinichi Nakajima,et al. Global analytic solution of fully-observed variational Bayesian matrix factorization , 2013, J. Mach. Learn. Res..

[19] Katja Hofmann,et al. Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.

[20] Ron Kohavi,et al. Online Controlled Experiments and A/B Testing , 2017, Encyclopedia of Machine Learning and Data Mining.

[21] Thorsten Joachims,et al. Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[22] Yifan Zhang,et al. Correcting for Selection Bias in Learning-to-rank Systems , 2020, WWW.

[23] Thorsten Joachims,et al. Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[24] M. de Rijke,et al. Click Models for Web Search , 2015, Click Models for Web Search.

[25] Filip Radlinski,et al. Optimized interleaving for online retrieval evaluation , 2013, WSDM.

[26] Filip Radlinski,et al. Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..