On Two Continuum Armed Bandit Problems in High Dimensions

We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of ℝd$\mathbb {R}^{d}$. For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension k ≪ d and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of ℝd$\mathbb {R}^{d}$. By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

[1]  R. DeVore,et al.  Approximation of Functions of Few Variables in High Dimensions , 2011 .

[2]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[3]  J. Körner Fredman-Kolmo´s bounds and information theory , 1986 .

[4]  L FredmanMichael,et al.  Storing a Sparse Table with 0(1) Worst Case Access Time , 1984 .

[5]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[6]  Jan Vybíral,et al.  Learning Functions of Few Arbitrary Linear Parameters in High Dimensions , 2010, Found. Comput. Math..

[7]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[8]  Arthur D. Szlam,et al.  Diffusion wavelet packets , 2006 .

[9]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[10]  A. Nilli Perfect Hashing and Probability , 1994, Combinatorics, Probability and Computing.

[11]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[13]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[14]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[15]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[16]  Adam Meyerson,et al.  Online oblivious routing , 2003, SPAA '03.

[17]  Robert D. Kleinberg,et al.  Online decision problems with large strategy sets , 2005 .

[18]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .

[19]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20]  R. Coifman,et al.  Diffusion Wavelets , 2004 .

[21]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[22]  J. Komlos,et al.  On the Size of Separating Systems and Families of Perfect Hash Functions , 1984 .

[23]  Emmanuel J. Candès,et al.  Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements , 2010, ArXiv.

[24]  Ryan O'Donnell,et al.  Learning juntas , 2003, STOC '03.

[25]  E. Greenshtein Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint , 2006, math/0702684.

[26]  Avrim Blum,et al.  Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[27]  Volkan Cevher,et al.  Active Learning of Multi-Index Function Models , 2012, NIPS.

[28]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[29]  Andreas Krause,et al.  Joint Optimization and Variable Selection of High-dimensional Gaussian Processes , 2012, ICML.

[30]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[31]  Hemant Tyagi,et al.  Continuum Armed Bandit Problem of Few Variables in High Dimensions , 2013, WAOA.

[32]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[33]  Alon Orlitsky,et al.  Worst-case interactive communication I: Two messages are almost optimal , 1990, IEEE Trans. Inf. Theory.

[34]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[35]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[36]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[37]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[38]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[39]  Qi Li,et al.  Nonparametric Econometrics: Theory and Practice , 2006 .

[40]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[41]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[42]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[43]  Aravind Srinivasan,et al.  Splitters and near-optimal derandomization , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[44]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[45]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[46]  V. Cevher,et al.  Learning Non-Parametric Basis Independent Models from Point Queries via Low-Rank Methods , 2013, 1310.1826.

[47]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.