Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions (Extended Abstract)

State-of-the-art Learning to Rank (LTR) methods for optimizing ranking systems based on user interactions are divided into online approaches – that learn by direct interaction – and counterfactual approaches – that learn from historical interactions. We propose a novel intervention-aware estimator to bridge this online/counterfactual division. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions. To the best of our knowledge, this is the first method that is shown to be highly effective in both online and counterfactual scenarios.

[1]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[2]  Yifan Zhang,et al.  Correcting for Selection Bias in Learning-to-rank Systems , 2020, WWW.

[3]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[4]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[5]  M. de Rijke,et al.  Policy-Aware Unbiased Learning to Rank for Top-k Rankings , 2020, SIGIR.

[6]  Michael Bendersky,et al.  Addressing Trust Bias for Unbiased Learning-to-Rank , 2019, WWW.

[7]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[8]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[9]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[10]  M. de Rijke,et al.  When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank , 2020, CIKM.

[11]  Qingyao Ai,et al.  Unbiased Learning to Rank: Online or Offline? , 2020, ArXiv.

[12]  Guido Zuccon,et al.  Counterfactual Online Learning to Rank , 2020, ECIR.

[13]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[14]  M. de Rijke,et al.  Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking , 2020, ICTIR.

[15]  M. de Rijke,et al.  Differentiable Unbiased Online Learning to Rank , 2018, CIKM.

[16]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[17]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[18]  Thorsten Joachims,et al.  Intervention Harvesting for Context-Dependent Examination-Bias Estimation , 2018, SIGIR.

[19]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[20]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[21]  M. de Rijke,et al.  Optimizing Ranking Models in an Online Setting , 2019, ECIR.