Addressing Trust Bias for Unbiased Learning-to-Rank

Existing unbiased learning-to-rank models use counterfactual inference, notably Inverse Propensity Scoring (IPS), to learn a ranking function from biased click data. They handle the click incompleteness bias, but usually assume that the clicks are noise-free, i.e., a clicked document is always assumed to be relevant. In this paper, we relax this unrealistic assumption and study click noise explicitly in the unbiased learning-to-rank setting. Specifically, we model the noise as the position-dependent trust bias and propose a noise-aware Position-Based Model, named TrustPBM, to better capture user click behavior. We propose an Expectation-Maximization algorithm to estimate both examination and trust bias from click data in TrustPBM. Furthermore, we show that it is difficult to use a pure IPS method to incorporate click noise and thus propose a novel method that combines a Bayes rule application with IPS for unbiased learning-to-rank. We evaluate our proposed methods on three personal search data sets and demonstrate that our proposed model can significantly outperform the existing unbiased learning-to-rank methods.

[1]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[2]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[3]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[4]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[5]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[6]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[7]  Thorsten Joachims,et al.  Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers , 2017, KDD.

[8]  Thomas Nedelec,et al.  Offline A/B Testing for Recommender Systems , 2018, WSDM.

[9]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[10]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[11]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[12]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[13]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[14]  Cheng Li,et al.  The LambdaLoss Framework for Ranking Metric Optimization , 2018, CIKM.

[15]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[16]  Chao Liu,et al.  Click chain model in web search , 2009, WWW '09.

[17]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[18]  Mark T. Keane,et al.  Modeling user behavior using a search-engine , 2007, IUI '07.

[19]  M. J. D. Powell,et al.  Weighted Uniform Sampling — a Monte Carlo Technique for Reducing Variance , 1966 .

[20]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[21]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[22]  Ben Carterette,et al.  Estimating Clickthrough Bias in the Cascade Model , 2018, CIKM.

[23]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[24]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[25]  Thorsten Joachims,et al.  Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank , 2018, ArXiv.

[26]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[27]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[28]  Lihong Li,et al.  Counterfactual Estimation and Optimization of Click Metrics for Search Engines , 2014, ArXiv.

[29]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[30]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[31]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[32]  Ben Carterette,et al.  Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback , 2018, SIGIR.

[33]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[34]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.