Mixture-Based Correction for Position and Trust Bias in Counterfactual Learning to Rank

In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine correction (AC) is a generalization of IPS that corrects for position bias and trust bias. IPS and AC provably remove bias, conditioned on an accurate estimation of the bias parameters. Estimating the bias parameters, in turn, requires an accurate estimation of the relevance probabilities. This cyclic dependency introduces practical limitations in terms of sensitivity, convergence and efficiency. We propose a new correction method for position and trust bias in CLTR in which, unlike the existing methods, the correction does not rely on relevance estimation. Our proposed method, mixture-based correction (MBC), is based on the assumption that the distribution of the CTRs over the items being ranked is a mixture of two distributions: the distribution of CTRs for relevant items and the distribution of CTRs for non-relevant items. We prove that our method is unbiased. The validity of our proof is not conditioned on accurate bias parameter estimation. Our experiments show that MBC, when used in different bias settings and accompanied by different LTR algorithms, outperforms AC, the state-of-the-art method for correcting position and trust bias, in some settings, while performing on par in other settings. Furthermore, MBC is orders of magnitude more efficient than AC in terms of the training time.

[1]  Weinan Zhang,et al.  U-rank: Utility-oriented Learning to Rank with Implicit Feedback , 2020, CIKM.

[2]  M. de Rijke,et al.  Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank , 2020, SIGIR.

[3]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[4]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[5]  Stephen E. Robertson,et al.  Score Distributions in Information Retrieval , 2009, ICTIR.

[6]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[7]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[8]  Maarten de Rijke,et al.  Unifying Online and Counterfactual Learning to Rank , 2020, ArXiv.

[9]  M. de Rijke,et al.  Differentiable Unbiased Online Learning to Rank , 2018, CIKM.

[10]  Zhen Qin,et al.  Attribute-based Propensity for Unbiased Learning in Recommender Systems: Algorithm and Case Studies , 2020, KDD.

[11]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[12]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[13]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[14]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[15]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[16]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[17]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[18]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[19]  Yun Yang,et al.  Cutoff for Exact Recovery of Gaussian Mixture Models , 2021, IEEE Transactions on Information Theory.

[20]  Michael Bendersky,et al.  Addressing Trust Bias for Unbiased Learning-to-Rank , 2019, WWW.

[21]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[22]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[23]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[24]  M. de Rijke,et al.  Policy-Aware Unbiased Learning to Rank for Top-k Rankings , 2020, SIGIR.

[25]  Qiang Yang,et al.  Beyond ten blue links: enabling user click modeling in federated web search , 2012, WSDM '12.

[26]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[27]  Ben Carterette,et al.  Estimating Clickthrough Bias in the Cascade Model , 2018, CIKM.

[28]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[29]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[30]  Yu Lu,et al.  Statistical and Computational Guarantees of Lloyd's Algorithm and its Variants , 2016, ArXiv.

[31]  M. de Rijke,et al.  When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank , 2020, CIKM.

[32]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.