Bandits and Experts in Metric Spaces

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is well understood, bandit problems with large strategy sets are still a topic of active investigation, motivated by practical applications, such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions that enable the design of efficient solutions. In this work, we study a general setting for the multi-armed bandit problem, in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a solution for the multi-armed bandit problem in this setting. That is, for every metric space, we define an isometry invariant that bounds from below the performance of Lipschitz MAB algorithms for this metric space, and we present an algorithm that comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions. We also address the full-feedback (“best expert”) version of the problem, where after every round the payoffs from all arms are revealed.

[1]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[2]  Alessandro Lazaric,et al.  Online Stochastic Optimization under Correlated Bandit Feedback , 2014, ICML.

[3]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2008, EC '09.

[4]  Tyler Lu,et al.  Showing Relevant Ads via Lipschitz Context Multi-Armed Bandits , 2010 .

[5]  Richard Cole,et al.  Searching dynamic point sets in spaces with bounded doubling dimension , 2006, STOC '06.

[6]  Laurent Viennot,et al.  The Inframetric Model for the Internet , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[7]  M. Mohri,et al.  Bandit Problems , 2006 .

[8]  L. Blume,et al.  The New Palgrave Dictionary of Economics, 2nd edition , 2008 .

[9]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[10]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[11]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[12]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[13]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[14]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[15]  Nikhil R. Devanur,et al.  The price of truthfulness for pay-per-click auctions , 2009, EC '09.

[16]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[17]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[18]  Zizhuo Wang,et al.  Close the Gaps: A Learning-While-Doing Algorithm for Single-Product Revenue Management Problems , 2014, Oper. Res..

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Adam D. Bull,et al.  Adaptive-treed bandits , 2013, 1302.2489.

[21]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[22]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[23]  Nikhil R. Devanur,et al.  Bandits with concave rewards and convex knapsacks , 2014, EC.

[24]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[25]  Umar Syed,et al.  Bandits, Query Learning, and the Haystack Dimension , 2011, COLT.

[26]  Akimichi Takemura,et al.  An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.

[27]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[28]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[29]  Rémi Munos,et al.  Online Learning in Adversarial Lipschitz Environments , 2010, ECML/PKDD.

[30]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[31]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[32]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[33]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[34]  Robert D. Kleinberg,et al.  Online decision problems with large strategy sets , 2005 .

[35]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[36]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[37]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[38]  Rémi Munos,et al.  Stochastic Simultaneous Optimistic Optimization , 2013, ICML.

[39]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[40]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[41]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[42]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[43]  Stanislav Minsker,et al.  Estimation of Extreme Values and Associated Level Sets of a Regression Function via Selective Sampling , 2013, COLT.

[44]  Aleksandrs Slivkins,et al.  Multi-armed bandits on implicit metric spaces , 2011, NIPS.

[45]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[46]  M. Talagrand The Generic chaining : upper and lower bounds of stochastic processes , 2005 .

[47]  Ittai Abraham,et al.  Name independent routing for growth bounded networks , 2005, SPAA '05.

[48]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[49]  Aleksandrs Slivkins,et al.  Sharp dichotomies for regret minimization in metric spaces , 2009, SODA '10.

[50]  Aleksandrs Slivkins Distance estimation and object location via rings of neighbors , 2006, Distributed Computing.

[51]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[52]  G. Cantor,et al.  Gesammelte Abhandlungen mathematischen und philosophischen Inhalts , 1934 .

[53]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[54]  Rémi Munos,et al.  Adaptive Bandits: Towards the best history-dependent strategy , 2011, AISTATS.

[55]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[56]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[57]  Robert Krauthgamer,et al.  Object location in realistic networks , 2004, SPAA '04.

[58]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[59]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[60]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[61]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[62]  Aleksandrs Slivkins Towards fast decentralized construction of locality-aware overlay networks , 2007, PODC '07.

[63]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[64]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[65]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[66]  Jean-Pierre Bourguignon,et al.  Mathematische Annalen , 1893 .

[67]  Robert W. Chen,et al.  Bandit problems with infinitely many arms , 1997 .

[68]  Kunal Talwar,et al.  Bypassing the embedding: algorithms for low dimensional metrics , 2004, STOC '04.

[69]  Rangarajan K. Sundaram Generalized Bandit Problems , 2005 .

[70]  Vashist Avadhanula,et al.  A Near-Optimal Exploration-Exploitation Approach for Assortment Selection , 2016, EC.

[71]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[72]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[73]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[74]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[75]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[76]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[77]  Eli Upfal,et al.  Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.

[78]  Deepayan Chakrabarti,et al.  Multi-armed bandit problems with dependent arms , 2007, ICML '07.

[79]  Alexandre Proutière,et al.  Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms , 2014, COLT.

[80]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[81]  Aleksandrs Slivkins,et al.  Adaptive contract design for crowdsourcing markets: bandit algorithms for repeated principal-agent problems , 2014, J. Artif. Intell. Res..

[82]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[83]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[84]  J. Heinonen Lectures on Analysis on Metric Spaces , 2000 .

[85]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[86]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[87]  Robert D. Kleinberg,et al.  On the internet delay space dimensionality , 2008, PODC '08.

[88]  G. Cantor Ueber unendliche, lineare Punktmannichfaltigkeiten , 1883 .

[89]  G. Cantor,et al.  Ueber unendliche, lineare Punktmannichfaltigkeiten , 1879 .

[90]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[91]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[92]  Elad Hazan,et al.  Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[93]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[94]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[95]  Deepayan Chakrabarti,et al.  Bandits for Taxonomies: A Model-based Approach , 2007, SDM.

[96]  Aleksandrs Slivkins,et al.  Distributed approaches to triangulation and embedding , 2005, SODA '05.

[97]  Jon M. Kleinberg,et al.  Triangulation and embedding using small sets of beacons , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[98]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[99]  E. Gilbert A comparison of signalling alphabets , 1952 .

[100]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[101]  Robert Krauthgamer,et al.  Bounded geometries, fractals, and low-distortion embeddings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[102]  Moshe Babaioff,et al.  Truthful mechanisms with implicit payment computation , 2010, EC '10.

[103]  Emin Gün Sirer,et al.  Meridian: a lightweight network location service without virtual coordinates , 2005, SIGCOMM '05.

[104]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[105]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[106]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[107]  Moshe Babaioff,et al.  Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[108]  Aleksandrs Slivkins Distance estimation and object location via rings of neighbors , 2005, PODC '05.

[109]  Stefan Mazurkiewicz,et al.  Contribution à la topologie des ensembles dénombrables , 1920 .

[110]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[111]  Sariel Har-Peled,et al.  Fast construction of nets in low dimensional metrics, and their applications , 2004, SCG.

[112]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[113]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.