When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank

Besides position bias, which has been well-studied, trust bias is another type of bias prevalent in user interactions with rankings: users are more likely to click incorrectly w.r.t. their preferences on highly ranked items because they trust the ranking system. While previous work has observed this behavior in users, we prove that existing Counterfactual Learning to Rank (CLTR) methods do not remove this bias, including methods specifically designed to mitigate this type of bias. Moreover, we prove that Inverse Propensity Scoring (IPS) is principally unable to correct for trust bias under non-trivial circumstances. Our main contribution is a new estimator based on affine corrections: it both reweights clicks and penalizes items displayed on ranks with high trust bias. Our estimator is the first estimator that is proven to remove the effect of both trust bias and position bias. Furthermore, we show that our estimator is a generalization of the existing (CLTR) framework: if no trust bias is present, it reduces to the original (IPS) estimator. Our semi-synthetic experiments indicate that by removing the effect of trust bias in addition to position bias, (CLTR) can approximate the optimal ranking system even closer than previously possible.

[1]  DuchiJohn,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011 .

[2]  Thorsten Joachims,et al.  A General Framework for Counterfactual Learning-to-Rank , 2018, SIGIR.

[3]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[4]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[5]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[6]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[7]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[8]  M. de Rijke,et al.  Policy-Aware Unbiased Learning to Rank for Top-k Rankings , 2020, SIGIR.

[9]  Yifan Zhang,et al.  Correcting for Selection Bias in Learning-to-rank Systems , 2020, WWW.

[10]  Unbiased Learning-to-Rank with Biased Feedback , 2018, IJCAI.

[11]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[12]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[13]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[14]  Cheng Li,et al.  The LambdaLoss Framework for Ranking Metric Optimization , 2018, CIKM.

[15]  JoachimsThorsten,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015 .

[16]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[17]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[18]  Shinichi Nakajima,et al.  Global analytic solution of fully-observed variational Bayesian matrix factorization , 2013, J. Mach. Learn. Res..

[19]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[20]  Yang Wang,et al.  Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm , 2018, WWW.

[21]  Michael Bendersky,et al.  Addressing Trust Bias for Unbiased Learning-to-Rank , 2019, WWW.

[22]  Thorsten Joachims,et al.  Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[23]  Thorsten Joachims,et al.  Intervention Harvesting for Context-Dependent Examination-Bias Estimation , 2018, SIGIR.

[24]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[25]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[26]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[27]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .