Balancing Speed and Quality in Online Learning to Rank for Information Retrieval

In Online Learning to Rank (OLTR) the aim is to find an optimal ranking model by interacting with users. When learning from user behavior, systems must interact with users while simultaneously learning from those interactions. Unlike other Learning to Rank (LTR) settings, existing research in this field has been limited to linear models. This is due to the speed-quality tradeoff that arises when selecting models: complex models are more expressive and can find the best rankings but need more user interactions to do so, a requirement that risks frustrating users during training. Conversely, simpler models can be optimized on fewer interactions and thus provide a better user experience, but they will converge towards suboptimal rankings. This tradeoff creates a deadlock, since novel models will not be able to improve either the user experience or the final convergence point, without sacrificing the other. Our contribution is twofold. First, we introduce a fast OLTR model called Sim-MGD that addresses the speed aspect of the speed-quality tradeoff. Sim-MGD ranks documents based on similarities with reference documents. It converges rapidly and, hence, gives a better user experience but it does not converge towards the optimal rankings. Second, we contribute Cascading Multileave Gradient De- scent (C-MGD) for OLTR that directly addresses the speed-quality tradeoff by using a cascade that enables combinations of the best of two worlds: fast learning and high quality final convergence. C-MGD can provide the better user experience of Sim-MGD while maintaining the same convergence as the state-of-the-art MGD model. This opens the door for future work to design new models for OLTR without having to deal with the speed-quality tradeoff.

[1]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[2]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[3]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[4]  Emine Yilmaz,et al.  Semi-supervised learning to rank with preference regularization , 2011, CIKM '11.

[5]  Shinichi Nakajima,et al.  Global analytic solution of fully-observed variational Bayesian matrix factorization , 2013, J. Mach. Learn. Res..

[6]  Marc Najork Using Machine Learning to Improve the Email Experience , 2016, CIKM.

[7]  Maarten de Rijke,et al.  Probabilistic Multileave Gradient Descent , 2016, ECIR.

[8]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[9]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[10]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  Ambuj Tewari,et al.  Online Learning to Rank with Feedback at the Top , 2016, AISTATS.

[13]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[14]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[15]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[16]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[17]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[18]  Katja Hofmann,et al.  Fast and reliable online learning to rank for information retrieval , 2013, SIGIR Forum.

[19]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[20]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[21]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[22]  Filip Radlinski,et al.  Comparing the sensitivity of information retrieval metrics , 2010, SIGIR.

[23]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[25]  Filip Radlinski,et al.  Optimized interleaving for online retrieval evaluation , 2013, WSDM.

[26]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[27]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[28]  D. W. Zimmerman Comparative Power of Student T Test and Mann-Whitney U Test for Unequal Sample Sizes and Variances , 1987 .

[29]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[30]  Filip Radlinski,et al.  Practical online retrieval evaluation , 2011, SIGIR.

[31]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[32]  Tong Zhao,et al.  Constructing Reliable Gradient Exploration for Online Learning to Rank , 2016, CIKM.

[33]  M. de Rijke,et al.  Probabilistic Multileave for Online Retrieval Evaluation , 2015, SIGIR.

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[36]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[39]  M. de Rijke,et al.  Multileaved Comparisons for Fast Online Evaluation , 2014, CIKM.

[40]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .