Online Learning to Rank with Top-k Feedback

We consider two settings of online learning to rank where feedback is restricted to top ranked items. The problem is cast as an online game between a learner and sequence of users, over $T$ rounds. In both settings, the learners objective is to present ranked list of items to the users. The learner's performance is judged on the entire ranked list and true relevances of the items. However, the learner receives highly restricted feedback at end of each round, in form of relevances of only the top $k$ ranked items, where $k \ll m$. The first setting is \emph{non-contextual}, where the list of items to be ranked is fixed. The second setting is \emph{contextual}, where lists of items vary, in form of traditional query-document lists. No stochastic assumption is made on the generation process of relevances of items and contexts. We provide efficient ranking strategies for both the settings. The strategies achieve $O(T^{2/3})$ regret, where regret is based on popular ranking measures in first setting and ranking surrogates in second setting. We also provide impossibility results for certain ranking measures and a certain class of surrogates, when feedback is restricted to the top ranked item, i.e. $k=1$. We empirically demonstrate the performance of our algorithms on simulated and real world datasets.

[1]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[2]  Branislav Kveton,et al.  Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.

[3]  Csaba Szepesvári,et al.  Partial Monitoring with Side Information , 2012, ALT.

[4]  Mingrui Wu,et al.  Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.

[5]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[6]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[7]  Nir Ailon,et al.  Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes , 2014, AISTATS.

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[10]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[11]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[12]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[13]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[14]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[15]  Kamlesh Karki Online Learning to Rank , 2017 .

[16]  Claudio Gentile,et al.  On multilabel classification and ranking with bandit feedback , 2014, J. Mach. Learn. Res..

[17]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[18]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[19]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[20]  Sébastien Bubeck,et al.  Multi-scale exploration of convex functions and bandit convex optimization , 2015, COLT.

[21]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[22]  Patrick Gallinari,et al.  "On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking" , 2012, NIPS.

[23]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[24]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[25]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[26]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[27]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[28]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[29]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[30]  Michael I. Jordan,et al.  On the Consistency of Ranking Algorithms , 2010, ICML.

[31]  Wei Chen,et al.  Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications , 2014, ICML.

[32]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[33]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[34]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[35]  Dean P. Foster,et al.  No Internal Regret via Neighborhood Watch , 2011, AISTATS.

[36]  Pradeep Ravikumar,et al.  On NDCG Consistency of Listwise Ranking Methods , 2011, AISTATS.

[37]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[38]  Aditya Bhaskara,et al.  Approximating matrix p-norms , 2010, SODA '11.

[39]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[40]  Djoerd Hiemstra,et al.  A cross-benchmark comparison of 87 learning to rank methods , 2015, Inf. Process. Manag..

[41]  Christian Schindelhauer,et al.  Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[42]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting , 2005, SODA 2005.

[43]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[44]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.