Ranked bandits in metric spaces: learning diverse rankings over large document collections

Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with several scalable algorithms that explicitly takes document similarity and ranking context into account. Our formulation is a non-trivial common generalization of two multi-armed bandit models from the literature: ranked bandits (Radlinski et al., 2008) and Lipschitz bandits (Kleinberg et al., 2008b). We present theoretical justifications for this approach, as well as a near-optimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than previous approaches.

[1]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[2]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[3]  Elad Hazan,et al.  Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[4]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[5]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[6]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[7]  Andreas Krause,et al.  Online Learning of Assignments , 2009, NIPS.

[8]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[9]  M. Woodroofe A One-Armed Bandit Problem with a Concomitant Variable , 1979 .

[10]  D. Aldous Exchangeability and related topics , 1985 .

[11]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[12]  Am Mudabeti,et al.  Remote sensing 1 , 2013 .

[13]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[14]  Deepayan Chakrabarti,et al.  Bandits for Taxonomies: A Model-based Approach , 2007, SDM.

[15]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[16]  Tyler Lu,et al.  Showing Relevant Ads via Lipschitz Context Multi-Armed Bandits , 2010 .

[17]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[18]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[19]  Nimrod Megiddo,et al.  Online Learning with Prior Knowledge , 2007, COLT.

[20]  Rémi Munos,et al.  Online Learning in Adversarial Lipschitz Environments , 2010, ECML/PKDD.

[21]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[22]  Simon Ryan,et al.  Bricks-and-mortar: Bookselling and supermarket philosophy , 2013 .

[23]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[24]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[25]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[26]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[27]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[28]  H. Vincent Poor,et al.  Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.

[29]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[30]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[31]  Aleksandrs Slivkins,et al.  Multi-armed bandits on implicit metric spaces , 2011, NIPS.

[32]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[33]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[34]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[35]  Rangarajan K. Sundaram Generalized Bandit Problems , 2005 .

[36]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[37]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[38]  Matthew J. Streeter,et al.  An Online Algorithm for Maximizing Submodular Functions , 2008, NIPS.

[39]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[40]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[41]  Aleksandrs Slivkins,et al.  Sharp dichotomies for regret minimization in metric spaces , 2009, SODA '10.

[42]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[43]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[44]  Robert E. Schapire,et al.  Non-Stochastic Bandit Slate Problems , 2010, NIPS.

[45]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[46]  Robert Krauthgamer,et al.  Bounded geometries, fractals, and low-distortion embeddings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[47]  Atsuyoshi Nakamura,et al.  Algorithms for Adversarial Bandit Problems with Multiple Plays , 2010, ALT.

[48]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[49]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[50]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[51]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[52]  K. Arrow,et al.  The New Palgrave Dictionary of Economics , 2020 .

[53]  Philippe Rigollet,et al.  Nonparametric Bandits with Covariates , 2010, COLT.

[54]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[55]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.