Unbiased Learning to Rank via Propensity Ratio Scoring

Implicit feedback, such as user clicks, is a major source of supervision for learning to rank (LTR) model estimation in modern retrieval systems. However, the inherent bias in such feedback greatly restricts the quality of the learnt ranker. Recent advances in unbiased LTR leverage Inverse Propensity Scoring (IPS) to tackle the bias issue. Though effective, it only corrects the bias introduced by treating clicked documents as relevant, but cannot handle the bias caused by treating unclicked ones as irrelevant. Because non-clicks do not necessarily stand for irrelevance (they might not be examined), IPS-based methods inevitably include loss from comparisons on relevant-relevant document pairs. This directly limits the effectiveness of ranking model learning. In this work, we first prove that in a LTR algorithm that is based on pairwise comparisons, only pairs with different labels (e.g., relevant-irrelevant pairs in binary case) should contribute to the loss function. The proof asserts sub-optimal results of the existing IPS-based methods in practice. We then derive a new weighting scheme called Propensity Ratio Scoring (PRS) that takes a holistic treatment on both clicks and non-clicks. Besides correcting the bias in clicked documents, PRS avoids relevant-relevant comparisons in LTR training in expectation and enjoys a lower variability. Our empirical study confirms that PRS ensures a more effective use of click data in various situations, which leads to its superior performance in an extensive set of LTR benchmarks.

[1]  Thorsten Joachims,et al.  Intervention Harvesting for Context-Dependent Examination-Bias Estimation , 2018, SIGIR.

[2]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[3]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[4]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[5]  ChengXiang Zhai,et al.  Content-aware click modeling , 2013, WWW '13.

[6]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[7]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[8]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[9]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[10]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[11]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[12]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[13]  Thorsten Joachims,et al.  A General Framework for Counterfactual Learning-to-Rank , 2018, SIGIR.

[14]  Thorsten Joachims,et al.  Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[15]  Thorsten Joachims,et al.  Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank , 2018, ArXiv.

[16]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[17]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[18]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[19]  Diego Klabjan,et al.  Listwise Learning to Rank by Exploring Unique Ratings , 2020, WSDM.

[20]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[21]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[22]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[23]  Yang Wang,et al.  Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm , 2018, WWW.

[24]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[25]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[26]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[27]  J. P. Arias-Nicolás,et al.  A logistic regression-based pairwise comparison method to aggregate preferences , 2008 .

[28]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[29]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[30]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[31]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[32]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[33]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[34]  Cheng Li,et al.  The LambdaLoss Framework for Ranking Metric Optimization , 2018, CIKM.