Online Learning to Rank with Feedback at the Top

We consider an online learning to rank setting in which, at each round, an oblivious adversary generates a list of $m$ documents, pertaining to a query, and the learner produces scores to rank the documents. The adversary then generates a relevance vector and the learner updates its ranker according to the feedback received. We consider the setting where the feedback is restricted to be the relevance levels of only the top $k$ documents in the ranked list for $k \ll m$. However, the performance of learner is judged based on the unrevealed full relevance vectors, using an appropriate learning to rank loss function. We develop efficient algorithms for well known losses in the pointwise, pairwise and listwise families. We also prove that no online algorithm can have sublinear regret, with top-1 feedback, for any loss that is calibrated with respect to NDCG. We apply our algorithms on benchmark datasets demonstrating efficient online learning of a ranking function from highly restricted feedback.

[1]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[2]  Djoerd Hiemstra,et al.  A cross-benchmark comparison of 87 learning to rank methods , 2015, Inf. Process. Manag..

[3]  Aditya Bhaskara,et al.  Approximating matrix p-norms , 2010, SODA '11.

[4]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[5]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[6]  Christian Schindelhauer,et al.  Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[7]  Sébastien Bubeck,et al.  Multi-scale exploration of convex functions and bandit convex optimization , 2015, COLT.

[8]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[9]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[10]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[11]  Pradeep Ravikumar,et al.  On NDCG Consistency of Listwise Ranking Methods , 2011, AISTATS.

[12]  Ambuj Tewari,et al.  Online Ranking with Top-1 Feedback , 2014, AISTATS.

[13]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[14]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[15]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[16]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[17]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[18]  Claudio Gentile,et al.  On multilabel classification and ranking with bandit feedback , 2014, J. Mach. Learn. Res..

[19]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[20]  Patrick Gallinari,et al.  "On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking" , 2012, NIPS.

[21]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting , 2005, SODA 2005.

[22]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[23]  Csaba Szepesvári,et al.  Partial Monitoring with Side Information , 2012, ALT.

[24]  Mingrui Wu,et al.  Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.

[25]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[26]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[27]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[28]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.