Accelerated Convergence for Counterfactual Learning to Rank

Counterfactual Learning To Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system. Employing such an offline learning approach has many benefits compared to an online one, but it is challenging as user feedback often contains high levels of bias. Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning from logged user interactions. One of the major difficulties in applying Stochastic Gradient Descent (SGD) approaches to counterfactual learning problems is the large variance introduced by the propensity weights. In this paper we show that the convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights: convergence is slow, especially when there are large IPS weights. To overcome this limitation, we propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods. We prove that CounterSample converges faster and complement our theoretical findings with empirical results by performing extensive experimentation in a number of biased LTR scenarios -- across optimizers, batch sizes, and different degrees of position bias.

[1]  H. Robbins A Stochastic Approximation Method , 1951 .

[2]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[3]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[4]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[7]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[8]  GayGeri,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[9]  Yoshua Bengio,et al.  Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.

[10]  Martin Kleinsteuber,et al.  Counterfactual Learning from Logs for Improved Ranking of E-Commerce Products , 2019, ArXiv.

[11]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[12]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[13]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[14]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[15]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[16]  A. J. Walker New fast method for generating discrete random numbers with arbitrary frequency distributions , 1974 .

[17]  JoachimsThorsten,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015 .

[18]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[19]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[20]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[21]  Fabrizio Silvestri,et al.  Post-Learning Optimization of Tree Ensembles for Efficient Ranking , 2016, SIGIR.

[22]  W. Bruce Croft,et al.  Unbiased Learning to Rank with Unbiased Propensity Estimation , 2018, SIGIR.

[23]  Thorsten Joachims,et al.  Intervention Harvesting for Context-Dependent Examination-Bias Estimation , 2018, SIGIR.

[24]  Ben Carterette,et al.  Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback , 2018, SIGIR.

[25]  Thorsten Joachims,et al.  Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[26]  Artem Grotov,et al.  Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.

[27]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[28]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[29]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[30]  Martin Kleinsteuber,et al.  Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products , 2019 .

[31]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[32]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.

[33]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[34]  R. Cooke Real and Complex Analysis , 2011 .

[35]  Alastair J. Walker,et al.  An Efficient Method for Generating Discrete Random Variables with General Distributions , 1977, TOMS.

[36]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[37]  Thorsten Joachims,et al.  A General Framework for Counterfactual Learning-to-Rank , 2018, SIGIR.

[38]  M. de Rijke,et al.  Deep Learning with Logged Bandit Feedback , 2018, ICLR.

[39]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[40]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[41]  François Fleuret,et al.  Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.

[42]  Ben Carterette,et al.  Estimating Clickthrough Bias in the Cascade Model , 2018, CIKM.

[43]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[46]  Nish Parikh,et al.  Query suggestion for E-commerce sites , 2011, WSDM '11.

[47]  Marc Najork,et al.  Learning from User Interactions in Personal Search via Attribute Parameterization , 2017, WSDM.