Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank

Unbiased counterfactual learning to rank (CLTR) requires click propensities to compensate for the difference between user clicks and true relevance of search results via inverse propensity scoring (IPS). Current propensity estimation methods assume that user click behavior follows the position-based click model (PBM) and estimate click propensities based on this assumption. However, in reality, user clicks often follow the cascade model (CM), where users scan search results from top to bottom and where each next click depends on the previous one. In this cascade scenario, PBM-based estimates of propensities are not accurate, which, in turn, hurts CLTR performance. In this paper, we propose a propensity estimation method for the cascade scenario, called cascade model-based inverse propensity scoring (CM-IPS). We show that CM-IPS keeps CLTR performance close to the full-information performance in case the user clicks follow the CM, while PBM-based CLTR has a significant gap towards the full-information. The opposite is true if the user clicks follow PBM instead of the CM. Finally, we suggest a way to select between CM- and PBM-based propensity estimation methods based on historical user clicks.

[1]  M. de Rijke,et al.  A Comparative Study of Click Models for Web Search , 2015, CLEF.

[2]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[3]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[4]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[5]  Chao Liu,et al.  Click chain model in web search , 2009, WWW '09.

[6]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[7]  Ben Carterette,et al.  Estimating Clickthrough Bias in the Cascade Model , 2018, CIKM.

[8]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[9]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[10]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[11]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[12]  Sebastian Bruch,et al.  Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks , 2018, ICTIR.

[13]  M. de Rijke,et al.  An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.

[14]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[15]  JoachimsThorsten,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015 .

[16]  M. de Rijke,et al.  Differentiable Unbiased Online Learning to Rank , 2018, CIKM.

[17]  Marc Najork,et al.  Learning Groupwise Scoring Functions Using Deep Neural Networks , 2018, ArXiv.

[18]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[19]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[20]  Michael Bendersky,et al.  Addressing Trust Bias for Unbiased Learning-to-Rank , 2019, WWW.