Unbiased Learning-to-Rank with Biased Feedback

Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-to-Rank (LTR) methods yields sub-optimal results. To overcome this bias problem, we present a counterfactual inference framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data. Using this framework, we derive a Propensity-Weighted Ranking SVM for discriminative learning from implicit feedback, where click models take the role of the propensity estimator. In contrast to most conventional approaches to de-biasing the data using click models, this allows training of ranking functions even in settings where queries do not repeat. Beyond the theoretical support, we show empirically that the proposed learning method is highly effective in dealing with biases, that it is robust to noise and propensity model misspecification, and that it scales efficiently. We also demonstrate the real-world applicability of our approach on an operational search engine, where it substantially improves retrieval performance.

[1]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[2]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[3]  Yue Wang,et al.  Beyond Ranking: Optimizing Whole-Page Presentation , 2016, WSDM.

[4]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[5]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: Sensitivity Analysis and Bounds , 2015 .

[6]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[7]  Jimmy J. Lin,et al.  A cascade ranking model for efficient ranked retrieval , 2011, SIGIR.

[8]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[9]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[10]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[11]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[12]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[15]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[16]  M. de Rijke,et al.  An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.

[17]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[18]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[19]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[20]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[21]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[22]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[23]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[24]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[25]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[26]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[27]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[28]  Thorsten Joachims,et al.  Stable Coactive Learning via Perturbation , 2013, ICML.

[29]  Thorsten Joachims,et al.  Unbiased Comparative Evaluation of Ranking Functions , 2016, ICTIR.

[30]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[31]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[32]  Thorsten Joachims,et al.  Learning Socially Optimal Information Systems from Egoistic Users , 2013, ECML/PKDD.

[33]  John Langford,et al.  Exploration scavenging , 2008, ICML '08.

[34]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[35]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[36]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[37]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .