Unbiased Learning to Rank: Counterfactual and Online Approaches

This tutorial is about Unbiased Learning to Rank, a recent research field that aims to learn unbiased user preferences from biased user interactions. We will provide an overview of the two main families of methods in Unbiased Learning to Rank: Counterfactual Learning to Rank (CLTR) and Online Learning to Rank (OLTR) and their underlying theory. First, the tutorial will start with a brief introduction to the general Learning to Rank (LTR) field and the difficulties user interactions pose for traditional supervised LTR methods. The second part will cover Counterfactual Learning to Rank (CLTR), a LTR field that sprung out of click models. Using an explicit model of user biases, CLTR methods correct for them in their learning process and can learn from historical data. Besides these methods, we will also cover practical considerations, such as how certain biases can be estimated. In the third part of the tutorial we focus on Online Learning to Rank (OLTR), methods that learn by directly interacting with users and dealing with biases by adding stochasticity to displayed results. We will cover cascading bandits, dueling bandit techniques and the most recent pairwise differentiable approach. Finally, in the concluding part of the tutorial, both approaches are contrasted, highlighting their relative strengths and weaknesses, and presenting future directions of research. For LTR practitioners our comparison gives guidance on how the choice between methods should be made. For the field of Information Retrieval (IR) we aim to provide an essential guide on unbiased LTR to understanding and choosing between methodologies.

[1]  Maarten de Rijke,et al.  Learning to Rank in Theory and Practice: From Gradient Boosting to Neural Networks and Unbiased Learning , 2019, SIGIR.

[2]  Yifan Zhang,et al.  Correcting for Selection Bias in Learning-to-rank Systems , 2020, WWW.

[3]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[4]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[5]  Ben Carterette,et al.  Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback , 2018, SIGIR.

[6]  Thorsten Joachims,et al.  Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[7]  M. de Rijke,et al.  Balancing Speed and Quality in Online Learning to Rank for Information Retrieval , 2017, CIKM.

[8]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[9]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[10]  Marc Najork,et al.  Learning with Sparse and Biased Feedback for Personal Search , 2018, IJCAI.

[11]  Artem Grotov,et al.  Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.

[12]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[13]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[14]  M. de Rijke,et al.  Differentiable Unbiased Online Learning to Rank , 2018, CIKM.

[15]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[16]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[17]  Thorsten Joachims,et al.  Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank , 2018, ArXiv.

[18]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[19]  Michael Bendersky,et al.  Addressing Trust Bias for Unbiased Learning-to-Rank , 2019, WWW.

[20]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[21]  Olivier Cappé,et al.  Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[22]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[23]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[24]  Zheng Wen,et al.  DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[25]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[26]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[27]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[28]  Maarten de Rijke,et al.  Probabilistic Multileave Gradient Descent , 2016, ECIR.

[29]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[30]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[31]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[32]  Thorsten Joachims,et al.  Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement , 2016, SIGIR.

[33]  Tong Zhao,et al.  Constructing Reliable Gradient Exploration for Online Learning to Rank , 2016, CIKM.

[34]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[35]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[36]  M. de Rijke,et al.  Optimizing Ranking Models in an Online Setting , 2019, ECIR.

[37]  Katja Hofmann,et al.  Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.