Propensity-Independent Bias Recovery in Offline Learning-to-Rank Systems

Learning-to-rank systems often utilize user-item interaction data (e.g., clicks) to provide users with high-quality rankings. However, this data suffers from several biases, and if naively used as training data, it can lead to suboptimal ranking algorithms. Most existing bias-correcting methods focus on position bias, the fact that higher-ranked results are more likely to receive interaction, and address this bias by leveraging inverse propensity weighting. However, it is not always possible to accurately estimate propensity scores, and in addition to position bias, selection bias is often encountered in real-world recommender systems. Selection bias occurs because users are exposed to a truncated list of results, which gives a zero chance for some items to be observed and, therefore, interacted with, even if they are relevant. Here, we propose a new counterfactual method that uses a two-stage correction approach and jointly addresses selection and position bias in learning-to-rank systems without relying on propensity scores. Our experimental results show that our method is better than state-of-the-art propensity-independent methods and either better than or comparable to methods that make the strong assumption for which the propensity model is known.

[1]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[2]  Yifan Zhang,et al.  Correcting for Selection Bias in Learning-to-rank Systems , 2020, WWW.

[3]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[4]  Chun Guo,et al.  Counterfactual Learning to Rank using Heterogeneous Treatment Effect Estimation , 2020, ArXiv.

[5]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[6]  Illtyd Trethowan Causality , 1938 .

[7]  M. de Rijke,et al.  Multileaved Comparisons for Fast Online Evaluation , 2014, CIKM.

[8]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[9]  Yuta Saito,et al.  Unbiased Pairwise Learning from Biased Implicit Feedback , 2020, ICTIR.

[10]  M. de Rijke,et al.  When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank , 2020, CIKM.

[11]  Yang Wang,et al.  Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm , 2018, WWW.

[12]  Michael Bendersky,et al.  Addressing Trust Bias for Unbiased Learning-to-Rank , 2019, WWW.

[13]  M. de Rijke,et al.  Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking , 2020, ICTIR.

[14]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[15]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[16]  Thorsten Joachims,et al.  A General Framework for Counterfactual Learning-to-Rank , 2018, SIGIR.

[17]  Yin Zhang,et al.  Unbiased Implicit Recommendation and Propensity Estimation via Combinational Joint Learning , 2020, RecSys.

[18]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[19]  Thorsten Joachims,et al.  Learning Socially Optimal Information Systems from Egoistic Users , 2013, ECML/PKDD.

[20]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[21]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[22]  J. Heckman Sample selection bias as a specification error , 1979 .

[23]  M. de Rijke,et al.  Differentiable Unbiased Online Learning to Rank , 2018, CIKM.

[24]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[25]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[26]  M. de Rijke,et al.  Policy-Aware Unbiased Learning to Rank for Top-k Rankings , 2020, SIGIR.

[27]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[28]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[29]  Zoubin Ghahramani,et al.  Probabilistic Matrix Factorization with Non-random Missing Data , 2014, ICML.

[30]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[31]  Thorsten Joachims,et al.  Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[32]  D. Blei,et al.  Causal Inference for Recommender Systems , 2020, RecSys.

[33]  Chih-Jen Lin,et al.  Improving Ad Click Prediction by Considering Non-displayed Events , 2019, CIKM.

[34]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[35]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[36]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[37]  Jui-Yang Hsia,et al.  Unbiased Ad Click Prediction for Position-aware Advertising Systems , 2020, RecSys.