Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

We consider a linear stochastic bandit problem where the dimension K of the unknown parameter is larger than the sampling budget n. In such cases, it is in general impossible to derive sub-linear regret bounds since usual linear bandit algorithms have a regret in O(K p n). In this paper we assume that is S sparse, i.e. has at most S non-zero components, and that the space of arms is the unit ball for the jj:jj2 norm. We combine ideas from Compressed Sensing and Bandit Theory and derive an algorithm with a regret bound in O(S p n). We detail an application to the problem of optimizing a function that depends on many variables but among which only a small number of them (initially unknown) are relevant.

[1]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[2]  Eli Upfal,et al.  Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.

[3]  José Niño Mora Restless Bandits, Partial Conservation Laws and Indexability , 2000 .

[4]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[5]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[6]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[7]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[8]  Eric Moulines,et al.  On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.

[9]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[10]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[11]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[12]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[13]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[14]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[15]  Aurélien Garivier,et al.  Parametric Bandits: The Generalized Linear Case , 2010, NIPS.

[16]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[17]  Gábor Lugosi,et al.  Minimax Policies for Combinatorial Prediction Games , 2011, COLT.

[18]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[19]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..