Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions

Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit from online interventions. We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results, conducted in a semi-synthetic experimental setup, show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions.

[1]  M. de Rijke,et al.  When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank , 2020, CIKM.

[2]  M. de Rijke,et al.  Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking , 2020, ICTIR.

[3]  M. de Rijke,et al.  Policy-Aware Unbiased Learning to Rank for Top-k Rankings , 2020, SIGIR.

[4]  Tao Yang,et al.  Unbiased Learning to Rank , 2020, ACM Trans. Inf. Syst..

[5]  Guido Zuccon,et al.  Counterfactual Online Learning to Rank , 2020, ECIR.

[6]  Yifan Zhang,et al.  Correcting for Selection Bias in Learning-to-rank Systems , 2020, WWW.

[7]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[8]  Huazheng Wang,et al.  Variance Reduction in Gradient Exploration for Online Learning to Rank , 2019, SIGIR.

[9]  Michael Bendersky,et al.  Addressing Trust Bias for Unbiased Learning-to-Rank , 2019, WWW.

[10]  M. de Rijke,et al.  Optimizing Ranking Models in an Online Setting , 2019, ECIR.

[11]  Thorsten Joachims,et al.  Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[12]  Thorsten Joachims,et al.  Intervention Harvesting for Context-Dependent Examination-Bias Estimation , 2018, SIGIR.

[13]  Cheng Li,et al.  The LambdaLoss Framework for Ranking Metric Optimization , 2018, CIKM.

[14]  M. de Rijke,et al.  Differentiable Unbiased Online Learning to Rank , 2018, CIKM.

[15]  Thorsten Joachims,et al.  A General Framework for Counterfactual Learning-to-Rank , 2018, SIGIR.

[16]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[17]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[18]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[19]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[20]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[21]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[22]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[23]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[24]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[25]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[26]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[27]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[28]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[29]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[30]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[31]  M. de Rijke,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, WSDM.