To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions

Learning to Rank (LTR) from user interactions is challenging as user feedback often contains high levels of bias and noise. At the moment, two methodologies for dealing with bias prevail in the field of LTR: counterfactual methods that learn from historical data and model user behavior to deal with biases; and online methods that perform interventions to deal with bias but use no explicit user models. For practitioners the decision between either methodology is very important because of its direct impact on end users. Nevertheless, there has never been a direct comparison between these two approaches to unbiased LTR. In this study we provide the first benchmarking of both counterfactual and online LTR methods under different experimental conditions. Our results show that the choice between the methodologies is consequential and depends on the presence of selection bias, and the degree of position bias and interaction noise. In settings with little bias or noise counterfactual methods can obtain the highest ranking performance; however, in other circumstances their optimization can be detrimental to the user experience. Conversely, online methods are very robust to bias and noise but require control over the displayed rankings. Our findings confirm and contradict existing expectations on the impact of model-based and intervention-based methods in LTR, and allow practitioners to make an informed decision between the two methodologies.

[1]  Artem Grotov,et al.  Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.

[2]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[3]  Philip S. Thomas,et al.  High-Confidence Off-Policy Evaluation , 2015, AAAI.

[4]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[5]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[6]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[7]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[8]  Marc Najork,et al.  Learning from User Interactions in Personal Search via Attribute Parameterization , 2017, WSDM.

[9]  M. de Rijke,et al.  Optimizing Ranking Models in an Online Setting , 2019, ECIR.

[10]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[11]  M. de Rijke,et al.  BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback , 2018, UAI.

[12]  Maarten de Rijke,et al.  Probabilistic Multileave Gradient Descent , 2016, ECIR.

[13]  Filip Radlinski,et al.  Optimized interleaving for online retrieval evaluation , 2013, WSDM.

[14]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[15]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[16]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[17]  Tong Zhao,et al.  Constructing Reliable Gradient Exploration for Online Learning to Rank , 2016, CIKM.

[18]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[19]  Yiqun Liu,et al.  Unbiased Learning to Rank: Theory and Practice , 2018, ICTIR.

[20]  John Langford,et al.  Off-policy evaluation for slate recommendation , 2016, NIPS.

[21]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[22]  Maarten de Rijke,et al.  Sensitive and Scalable Online Evaluation with Theoretical Guarantees , 2017, CIKM.

[23]  Thorsten Joachims,et al.  Counterfactual Learning-to-Rank for Additive Metrics and Deep Models , 2018, ArXiv.

[24]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[25]  M. de Rijke,et al.  Balancing Speed and Quality in Online Learning to Rank for Information Retrieval , 2017, CIKM.

[26]  M. de Rijke,et al.  Differentiable Unbiased Online Learning to Rank , 2018, CIKM.

[27]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[28]  M. de Rijke,et al.  BubbleRank: Safe Online Learning to Rerank , 2018, ArXiv.

[29]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[30]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[31]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[32]  M. de Rijke,et al.  Multileaved Comparisons for Fast Online Evaluation , 2014, CIKM.

[33]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[34]  Thorsten Joachims,et al.  Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement , 2016, SIGIR.

[35]  M. de Rijke,et al.  An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.

[36]  Djoerd Hiemstra,et al.  A cross-benchmark comparison of 87 learning to rank methods , 2015, Inf. Process. Manag..

[37]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[38]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.