论文信息 - Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit - 字舞流文

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

We consider a linear stochastic bandit problem where the dimension K of the unknown parameter is larger than the sampling budget n. In such cases, it is in general impossible to derive sub-linear regret bounds since usual linear bandit algorithms have a regret in O(K p n). In this paper we assume that is S sparse, i.e. has at most S non-zero components, and that the space of arms is the unit ball for the jj:jj2 norm. We combine ideas from Compressed Sensing and Bandit Theory and derive an algorithm with a regret bound in O(S p n). We detail an application to the problem of optimizing a function that depends on many variables but among which only a small number of them (initially unknown) are relevant.

Rémi Munos | Alexandra Carpentier | R. Munos | A. Carpentier

[1] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[2] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.

[3] José Niño Mora. Restless Bandits, Partial Conservation Laws and Indexability , 2000 .

[4] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[5] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[6] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[7] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[8] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.

[9] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[10] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[11] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[12] Mike E. Davies,et al. Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[13] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[14] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[15] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.

[16] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.

[17] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.

[18] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .

[19] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..