Differentiable Unbiased Online Learning to Rank

Online Learning to Rank (OLTR) methods optimize rankers based on user interactions. State-of-the-art OLTR methods are built specifically for linear models. Their approaches do not extend well to non-linear models such as neural networks. We introduce an entirely novel approach to OLTR that constructs a weighted differentiable pairwise loss after each interaction: Pairwise Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional approach that relies on interleaving or multileaving and extensive sampling of models to estimate gradients. Instead, its gradient is based on inferring preferences between document pairs from user clicks and can optimize any differentiable model. We prove that the gradient of PDGD is unbiased w.r.t. user document pair preferences. Our experiments on the largest publicly available Learning to Rank (LTR) datasets show considerable and significant improvements under all levels of interaction noise. PDGD outperforms existing OLTR methods both in terms of learning speed as well as final convergence. Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear models to be optimized effectively. Our results show that using a neural network leads to even better performance at convergence than a linear model. In summary, PDGD is an efficient and unbiased OLTR approach that provides a better user experience than previously possible.

[1]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[2]  Alexandros Karatzoglou,et al.  Learning to rank for recommender systems , 2013, RecSys.

[3]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[4]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[5]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[6]  M. de Rijke,et al.  Multileaved Comparisons for Fast Online Evaluation , 2014, CIKM.

[7]  M. de Rijke,et al.  Probabilistic Multileave for Online Retrieval Evaluation , 2015, SIGIR.

[8]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[9]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[10]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[11]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[12]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[13]  Maarten de Rijke,et al.  Probabilistic Multileave Gradient Descent , 2016, ECIR.

[14]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[15]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[16]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[17]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[18]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[19]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[20]  Katja Hofmann,et al.  Fast and reliable online learning to rank for information retrieval , 2013, SIGIR Forum.

[21]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[22]  Eyke Hüllermeier,et al.  Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach , 2015, NIPS.

[23]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[24]  ChengXiang Zhai,et al.  Evaluation of methods for relative comparison of retrieval systems based on clickthroughs , 2009, CIKM.

[25]  Maarten de Rijke,et al.  Sensitive and Scalable Online Evaluation with Theoretical Guarantees , 2017, CIKM.

[26]  M. de Rijke,et al.  Online Exploration for Detecting Shifts in Fresh Intent , 2014, CIKM.

[27]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[28]  M. de Rijke,et al.  An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.

[29]  Filip Radlinski,et al.  A Theoretical Framework for Conversational Search , 2017, CHIIR.

[30]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[31]  Shubhra Kanti Karmaker Santu,et al.  On Application of Learning to Rank for E-Commerce Search , 2017, SIGIR.

[32]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[33]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[34]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[35]  Ismail Sengör Altingövde,et al.  How useful is social feedback for learning to rank YouTube videos? , 2013, World Wide Web.

[36]  M. de Rijke,et al.  Balancing Speed and Quality in Online Learning to Rank for Information Retrieval , 2017, CIKM.

[37]  Pertti Vakkari,et al.  Changes in relevance criteria and problem stages in task performance , 2000, J. Documentation.

[38]  Shinichi Nakajima,et al.  Global analytic solution of fully-observed variational Bayesian matrix factorization , 2013, J. Mach. Learn. Res..

[39]  Salvatore Orlando,et al.  Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees , 2016, ACM Trans. Inf. Syst..

[40]  Susan T. Dumais Keynote: The Web Changes Everything: Understanding and Supporting People in Dynamic Information Environments , 2010, ECDL.

[41]  M. de Rijke,et al.  Relative confidence sampling for efficient on-line ranker evaluation , 2014, WSDM.

[42]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[43]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .