Unbiased Learning to Rank: Theory and Practice

Implicit user feedback (such as clicks and dwell time) is an important source of data for modern search engines. While heavily biased~\citejoachims2005accurately,keane2006modeling,joachims2007evaluating,yue2010beyond, it is cheap to collect and particularly useful for user-centric retrieval applications such as search ranking and query recommendation. Understanding the bias inherent in current systems and designing learning to rank algorithms that can effectively learn from implicit user feedback without bias is an important research direction that can significantly improve the quality of modern search engines. To develop such anunbiased learning-to-rank (ULTR) system, previous studies have focused on constructing probabilistic graphical models (e.g., click models) with user behavior hypothesis to extract and train ranking systems with unbiased relevance signals. Recently, a novel counterfactual learning framework that estimates and adopts examination propensity for unbiased learning to rank has attracted much attention, both in academia and industry. Despite its popularity, there is no systematic comparison and analysis of the unbiased learning-to-rank frameworks based on graphical models and counterfactual learning. In this tutorial, we provide an overview of the fundamental mechanism and algorithms for unbiased learning to rank. We describe and analyze the theory behind each learning framework, and give detailed instructions on how to conduct unbiased learning to rank in practice.

[1]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[2]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[3]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[4]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[5]  Thorsten Joachims,et al.  Learning Socially Optimal Information Systems from Egoistic Users , 2013, ECML/PKDD.

[6]  ChengXiang Zhai,et al.  Content-aware click modeling , 2013, WWW '13.

[7]  Yiqun Liu,et al.  Incorporating Non-sequential Behavior into Click Models , 2015, SIGIR.

[8]  Mark T. Keane,et al.  Modeling Result-List Searching in the World Wide Web: The Role of Relevance Topologies and Trust Bias , 2006 .

[9]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[10]  Erick Cantú-Paz,et al.  Temporal click model for sponsored search , 2010, SIGIR.

[11]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[12]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[13]  Yiqun Liu,et al.  Training Deep Ranking Model with Weak Relevance Labels , 2017, ADC.

[14]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[15]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[16]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[17]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[18]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[19]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[20]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[21]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[22]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[23]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[24]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[25]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[26]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[27]  Jiafeng Guo,et al.  Analysis of the Paragraph Vector Model for Information Retrieval , 2016, ICTIR.

[28]  Krishna P. Gummadi,et al.  Equity of Attention: Amortizing Individual Fairness in Rankings , 2018, SIGIR.